Inference of gene regulatory networks from time series by Tsallis entropy

Background The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.


Background
In general, living organisms can be viewed as net-works of molecules connected by chemical reactions. More specifically, the cell control involves the activity of several related genes through gene networks, with the relationship among them being generally broadly unknown. The inference or reverse-engineering of such gene networks is very important to uncover the functional relationship among genes and can be defined as the identification of gene interactions from experimental data through computational analysis [1].
Gene regulatory networks (GRNs) are used to indicate the interrelation among genes in the genomic level [2]. Such information is very important for disease treatment design, drugs creation purposes and to understand the activity of living organisms in the molecular level. In fact, there is a strong motivation for the inference of GRNs from gene expression patterns, e.g., motivating the DREAM project [3].
The development of techniques for sampling expression levels of genes along time has increased the possibility of important advances in the understanding of * Correspondence: fabricio@utfpr.edu.br 1 Federal University of Technology -Paraná, Brazil Full list of author information is available at the end of the article regulatory mechanisms of gene transcription and protein synthesis. In this context, an important task is the study and identification of high-level properties of gene networks and their interactions, without the necessity of low-level biochemical descriptions. It is not the objective of this work to analyze a detailed biochemical model. The objective is to recover the gene connections in a global and simple way, by identifying the most significant connections (relationships).
While it is not possible to infer the network topology with great accuracy using only gene expression measurements mainly due to the short sample sets and the high system dimension, i.e., the number of genes, as well as its complexity [4], the use of such inferences can be very important for planning experiments and/or to focus in some small meaningful subgroups of genes, thus reducing the complexity of the problem.
We are interested in the inference of network topology from temporal expression profiles by minimizing the conditional entropy between the genes, i.e., the gene entropy conditioned to the state of others genes. Given a gene, the idea is to set as predictors the genes that minimize its entropy. Therefore, the conditional entropy works as a criterion function which has to be minimized. As in a typical machine learning problem, the quality of the inference depends on the data and the criterion function. If the data is not representative, the obtained solution will probably not be a global minimum but a local one. Similarly, if the criterion function is not suitable, the solution can only partially satisfy the constraint imposed by the data or even represent a wrong solution. Of course, since the criterion function follows the properties of the entropy concept, a completely wrong solution is not expected. In other words, if the observation of some genes reduces the uncertainty on the target gene, the prediction accuracy is improved. But it may not be the best or optimal one, which brings the question: what is the best entropy function for the inference of GRNs?
In this paper we address this question by presenting a new criterion function for the inference of GRNs in order to introduce the sensibility of the minimum conditional entropy regarding its functional form. The generalized entropy functional form proposed by Tsallis [5] is adopted, which not only recovers the Shannon form but also presents properties required by the Statistical Physics Theory. These properties are related to Thermodynamics principles, to the concept of stability and its axiomatic foundations [6].
A variety of statistical methods to infer network topology has been applied to gene expression data [1,[7][8][9][10][11][12][13][14][15][16][17][18][19][20]. The results are often evaluated by comparing predicted couplings with those known from biological databases. While this procedure can elucidate or corroborate inferred interactions between some couples of genes, it has the drawback of the difficulty in estimating the false detection rate [4] and thus making the validation process very difficult. As it is not always possible to assure the quality of inference methods by analytical calculus, mainly in high complex systems, it is very important to use computational experiments to do it. Besides, in such experiments (simulations) it is possible to investigate prior information, as topology classes (e.g., random or scale-free networks), or the system dynamics. Therefore, an Artificial Gene Network (AGN) platform [21,22] and the DREAM4 in silico network challenge [3] are explored in the present paper in order to assess the GRNs inference process by generalized entropy introduced in the present paper.

Experiments
In order to verify the effect of the entropic parameter q, we carried out inference experiments considering two types of network topologies: the uniformly-random Erdös-Rényi (ER) and the scale-free Barabási-Albert (BA) models [23][24][25]. In the ER model each connection (edge) is present with equal probability, in such way that the probability distribution of the connectivity of the genes follows a Binomial or Poisson distribution, with mean = 〈k〉. On the other hand, in the BA model the probability of a new node v j be connected to the node v i is proportional to the connectivity of v i , which produces a powerlaw in the probability distribution of the connectivity.
The data set D T was generated according to Sec. 4.3.2 with N = 100 (the number of genes). For each type of network model 10 sequences of 30 transitions starting from random initial states were generated, which are obtained by applying Boolean transition functions. Then, the 10 segments were concatenated into a single expression profile, which was submitted to the network inference method. The inference was made by means of Equation 6 with q varying from 0.1 to 3.1 in steps of 0.1 and from 3.1 to 10.1 in steps of 0.5, i.e., the similarity between the source and the inferred AGN was calculated to each q in this range.
The similarity curves shown in Figure 1 were obtained by averaging 50 runs (different source networks) for each network model. In both network models improvements were observed in the similarity by ranging q, with the maximum 〈Similarity(A, B)〉 being reached by q ≠ 1 for all tried 〈k〉. Besides, it can also be noted that the q* that maximizes the similarity seems to be almost independent of the network model and the average connectivity. Figures 2(a) and 2(b) show the boxplots of the similarity values for each q and k values. It is possible to notice a very small variation in the boxplots, indicating stable results for all q values.
In order to better investigate this behavior, Figure 3 shows the normalized frequency curves of the best q for each gene in the sense of higher similarity. It is clearly observed that higher frequencies are concentrated in the range 2 ≤ q ≤ 3 for both network models and varied connectivity. This indicates and reinforces ( Figure 1) a non-dependence on the topology network in the improvement of the inference by taking non-Shannon entropy (q ≠ 1).
In particular, considering the frequency curves in Figure 3, the average q* was calculated for each network model given the average connectivity. These averages seem to be almost constant (around 3.20 for the ER model and 3.23 for the BA model) as well as the q's with higher frequencies, i.e., maximum amplitude in the frequency curves.
In order to confirm our findings, we also evaluate the behavior of the proposed methodology by using the DREAM4 in silico network challenge [3]. In this challenge the time series data was considered, which provides five different networks of size 10 and other five of size 100. The networks of size 10 have 5 different time series, while the networks of size 100 have 10 time series. Each time series has 21 time points generated from a differential equations model with noise. The DREAM4 in silico network challenge has 5 and 10 time series with 21 time points each, which were also concatenated to form a single expression profile, similarly to the previous case (AGNs).
The same methodology was applied with the similar used parameters. Only one additional step was included for the quantization of the DREAM data. The proposed criterion function and the adopted methodology are based on entropy calculations, in which a step of data quantization may be required if the original input data is not discrete, is the case of DREAM data. The applied method for the quantization process is described in [26]. It was applied by considering 2 levels for networks of size 10 and 3 levels for networks of size 100. In this context, an integer value represents each quantization level used by the quantization process. For example, 2 levels means that the quantized signal has only 0's and 1's. Then, each quantized network signal was submitted to the same methodology adopted in the present pa-per. Figure 4 presents the average results obtained for each DREAM network size: 10 and 100. It is possible to notice an improvement on the similarities by varying the parameter q, in which the best results were obtained by q ≠ 1 for the two network sizes. Figure 5 presents the normalized frequency, in which the q value was able to infer the best set of predictors (higher similarity) for each gene. The higher frequencies are concentrated in the range 2.2 ≤ q ≤ 4.1 for the DREAM network of size 10 and 3.2 ≤ q ≤ 5.5 for the DREAM network of size 100. Regarding the frequency curve in Figure 5, the average q* was calculated for each network size, being around 3.30 for the DREAM 10 and 3.92 for the DREAM 100, which are similar to those presented for ER and BA networks, but with slightly higher value for DREAM 100 network. It is important to highlight the existence of a range of q values that produce better results, on average 2.5 ≤ q ≤ 3.5 (subextensive entropy).
All experimental results confirm that the proposed criterion function can improve the accuracy of the inference process, thus indicating that the network nonextensivity is an important matter of investigation for inference methods based on information theory. As a result, it achieved a better accuracy on the inference of GRNs from gene expression patterns.

Discussion
The use of the entropy or mutual information as a criterion function on the problem of network inference is not new and has been largely applied for the inference of GRNs in recent years [1,10,11,13,14,16,17,19,20]. This is explained by the possibility that some genes may be well predicted by observing states of other genes in a regulatory network, which makes the use of conditional entropies suitable. If the relationship between these genes are linear, a simple Pearson correlation analysis would be enough to get a good description of the gene network. However, when the relationship between genes is not linear but it is described by functions of more than one predictor gene, it is expected that the inference by methods based on the entropy concept produces better results than those based on Pearson correlation. Naturally, this leads to the necessity of investigating the sensibility or robustness of these methods with respect to the extensivity of the applied entropy. In this context, it was verified in a previous work [27] that the entropic parameter q was very important to achieve better results in the GRNs inference process. In the present work, we introduce a criterion function by adopting the generalized Tsallis entropy in order to verify the dependence of the inference on the entropy functional form and characterize how this dependence occurs.
The experimental results provide more evidence about the sensibility of the inference process to the extensive/ nonextensive entropies. In addition, the experimental results indicate that the nonextensivity property of the entropy is an important factor to be investigated and explored in the GRNs inference process in order to improve its accuracy, thus opening new perspectives for  inference methods based on the entropy minimization principle.
As expected, we observed different similarity scores for different entropic parameters q. The maximum similarity score for all tried network models was reached by q ≠ 1, with an improvement of 20% compared to the similarity score for q = 1 (see Figure 1 and 4). In order to better visualize the relevance of this improvement, it is important to take a look closer on the correctly and incorrectly inferred edges. For a network with N genes, N 2 directed edges are possible when every node is connected to itself and to each other, (C ij = 1 for all 1 ≤ i, j ≤ N ). As the simulations were made with 1 ≤ 〈k〉 ≤ 5, C was always a sparse matrix with the number of connections between the genes given by T P + F N . Table 1 presents the best number of correctly and incorrectly inferred edges by considering each gene individually. It is possible to observe a very good accuracy of recovering correct edges (T P and F P ) in the ER and BA model by adopting q = 2.5 (subextensive entropy). In this context, the recovery of false connections (FP) seems to be dependent of the best entropy functional form. On the other hand, the network model does not seem to be dependent. Therefore, in order to improve the inference it is necessary to introduce information about the class model in the method. Furthermore, another observed property that does not depend on the network class model is the reduction on the number of inferred false connections (FP), i.e., when the algorithm infers a connection that does not exist between a pair of genes. This indicates a more conservative inference when an adjusted q is used, even for networks with high connectivitythe number of FP connections for 〈k〉 = 5 obtained by the Shannon entropy was more than six times greater than that obtained by the generalized entropy with the adjusted Figure 5 Frequencies where the q value appears in the better answer for DREAM networks. Frequencies where the q value appears in the better answer for each target gene, by considering the average results over the five DREAM networks available of each size: 10 and 100 genes. The best results found for q = 2.5 compared with q = 1.0 by considering each gene individually in the same network: (a) uniformly-random Erdös-Rényi model (ER) and (b) scale-free Barabási-Albert network topology (BA). q = 2.5 for the BA networks and more than eight times greater for the ER networks. It was also observed that distributions with mass concentrated in one of the classes are less penalized by applying q values near to 2.5. By considering that the system (organism) has a stochastic behavior and can receive external perturbations, it is expected that the class distributions are not deterministic among the possible classes, i.e, in binary case 0 or 1. In other words, given the nature of the system it is desirable from method to infer connections from classes with concentrated distributions and few errors among its classes (Table 2(b)) compared to more uniform distributions in one of the classes and no errors in the other ( Table 2 (a)). An important observed issue is that subextensive entropies, e.g., q values near to 2.5, promote this preference in the presented inference method. Table 2 shows an example of probability distribution that illustrates this situation. The predictor states are on the first column and the number of observed states for the target states on columns two and three, thus generating a mass probability distribution table for a target gene by observing its predictor states. In columns four, five and six we have the criterion function results (conditional entropy) for each distribution by using different q values. The mean conditional entropy results marked with * represent the minimal achieved by the method, and therefore selected as predictor for the target by the inference method.
As we can see, the minimum criterion function score changes with q and so the gene will be selected as predictor. For q = 0.5 and 1.0 the method selected gene A as best predictor, while gene B is selected for q = 2.5. For almost probable states, the derivative of the generalized entropy increases as q decreases (see Figure 6). This behavior allows S q (target|B = 1) to be significantly greater than S q (target|A = 1) depending on q. In this context, distributions concentrated in one of the classes (few errors) can produce higher conditional entropy values, which can be very amplified by the predictor distribution mass. Therefore, when q = 0.5 or 1.0 the method selects the predictor gene A since it induces a null entropy on the target (when A is active), besides the high entropy on the target induced when it (gene A) is inactive. However, when q is set to 2.5 (subextensive entropy) the balance between the conditional entropy and the predictor probability mass is adjusted in order to produce better accuracy on the inference process.
In summary, this situation characterizes how the subextensive entropy (q = 2.5) produces better results. In this example, it is considered a single target gene with a fixed number of time points on its expression data. Hence, Table 2(a) and 2(b) characterize two conditions of frequencies distribution that produce different predictors for the same target gene by using different values of q, in which q = 2.5 (subextensive entropy) achieves the correct predictor for the target. This example illustrates the trade-off between the conditional entropy of the target and the probability distribution of the predictor.
Tables 1(a) and 1(b) present the results obtained by a single value of the entropic parameter q = 2.5, in order to show how the improvements are achieved by fixing q value on the range 2.5 ≤ q ≤ 3.5 (subextensive entropy). However, the main point in the Tsallis Theory is that there is not an universal q that should be used on every data set. The optimal q should be set by the system (or kind of systems), e.g., we have observed that Table 2 Example of change on the inferred predictor by using different values for q entropic parameter.  for Boolean networks this value was found close to 2.5 and 3.5 for the DREAM networks. If we pay attention to the Figures 2(a) and 2(b), it will be noted that not only the averaged similarity is improved by considering q = 2.5 instead of q = 1, but also the best and worst inferences (the highest and lower line in the boxplot) obtained in the sample dataset. Besides, it can also be observed the variance in the similarity is almost constant with respect to q (q = 1 and q = 2.5) for low levels of connectivity (small k) and reduced for high levels of connectivity (large k) when q = 2.

Conclusions
In general, reverse-engineering algorithms using time series data need to be improved [1]. The present work opens new perspectives for methods based on information theory, in face of all results discussed which show a relevant improvement on the inference accuracy by adopting nonextensive entropies proposed by Tsallis. In particular, the subextensive entropies provide a remarkable improvement of accuracy by reducing the number of false connections detected by the method. The obtained experimental results showed the importance of the range of values 2.5 ≤ q ≤ 3.5 (subextensive entropy).
An interesting point regards the logic circuits created by Boolean functions and its dynamics. The inference method finds some of them independent of the q value, while others are found by tuning this parameter, as presented in the previous section. Future works should investigate the Boolean functions or logic circuits that are sensitive to entropic parameter q and the local structures formed by them.
The inference algorithm and criterion function described in this work were implemented and included in the DimReduction software [26], which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.

Selecting predictors by conditional entropy
The mutual information may be understood as a measure of the dependence between variables, with this dependence being quantified by calculating the average amount in the uncertainty on some variable v i given the knowledge about other accessible variable v k , and viceversa. In this sense, the mutual information indicates how much the prediction error of the state of v i changes if we know the state of v k . Given two random variables v i and v k , their mutual information can be calculated by [28]: where are the Boltzmann-Gibbs entropy of the gene i and its conditional entropy on the gene v k , also known as the Shannon entropy and its conditional entropy, respectively.
If the states of the genes taken into account in Equation 1 are collected in distinct times, i.e., v i (t+1) and v k (t), the mutual information can be used to select predictor genes (v k (t)) as those that minimize the uncertainty on the target gene (v i (t + 1)). Thus, the method consists in finding the gene v k that maximizes Equation 1 for a given target gene v i , which is equivalent to find the gene v k that minimizes the conditional entropy S(v i (t + 1)|v k (t)). Despite the symmetry in I(v i (t +1), v k (t)) with respect to the variables v i (t + 1) and v k (t), since the state variables computed in it belong to different time instants, t and t + 1, it is possible to infer a causality between v i (t + 1) and v k (t). As I(v i (t + 1), v k (t)) is not necessarily equal to I(v k (t + 1), v i (t)), this causality can be estimated by the difference between I(v i (t+1), v k (t)) and I(v k (t+1), v i (t)) or, in a simple way, by S(v i (t + 1)|v k (t)).
Naturally, the mutual information is not restricted to pairs of genes and we can use it to infer the dependence of v i on groups of genes: I(v i (t + 1); {v j ...v k }(t)) = S(v i (t + 1)) -S(v i (t + 1)|{v j ...v k }(t)). Therefore, given a set D of temporal gene expression profiles from a network, the method looks for the group of genes that maximizes Equation 1 for each gene. If I(v i (t + 1); {v j ...v k }(t)) presents the maximum score calculated from D, then each gene of {v j ...v k } is directly connected to v i as predictor.
In the same way, if there is not a group that causes significantly variations on the mutual information, then v i is selected as a source or an isolated gene (in the case that v i is not selected as a predictor of any gene). Once the method is applied to each gene individually, the individual entropy of the target v i (S(v i (t + 1))) is kept constant during the search for predictors, and as a result the method returns as predictors the genes that produce the lowest conditional entropy (S(v i (t +1)|{v j ...v k }(t))). In other words, the mutual information can be calculated by the difference between the individual entropy S (v i (t + 1)) and the mean conditional entropy S(v i (t + 1)| {v j ...v k }(t)), by considering a group of genes g(t) = {v j ... v k }(t). Therefore, the difference between I(v i , v k ) and I(v i , g) is due to the mean conditional entropy, once the individual entropy of v i , S(v i ), is exactly the same in both I(v i , v k ) and I(v i , g).

Beyond the Boltzmann entropy
The concept of entropy was introduced by Clausius in the context of Thermodynamics considering only macroscopic statements [29]. Motivated by the idea of relating it to the Classical Mechanics some years later, Boltzmann showed that this entropy could be expressed in terms of the probabilities associated to the microscopic configuration of the system [30]. However, in his mathematical demonstration there were some considerations about the nature of the physical system to assure the recovery of the properties of Clausius macroscopic entropy by his microscopic approache.g., short-range interactions, a necessary condition to assure the extensivity of the Boltzmann entropy [6,31]. Thus, despite the great importance and success of the Boltzmann entropy, there are situations were such conditions are not preserved [32] and Boltzmann entropy will hardly recover the properties of the Clausius entropy.
Inspired by the probabilistic description of multifractal geometry, C. Tsallis proposed in 1988 a generalization of the Boltzmann entropy [5] which, along two decades, has been successful in presenting desired properties of Statistical Physics Theory [6,33] with great experimental accordance [31].
The proposed definition is [5] where k is a positive constant (which sets the dimension and scale), w is the number of distinct configurations of the system, p i is the probability of such configuration and q ℛ is the entropic parameter.
The entropic parameter characterizes the degree of nonextensivity, which in the limit q 1 recovers S = −k w i p i ln p i k with k being set to the Boltzmann constant k B .
The Boltzmann-Gibbs entropy is said to be extensive in the sense that, for a system consisting of N independent but equivalent subsystems v = {v 1 , v 2 , ..., v N }, the entropy of the system is given by the sum of the entropy of the subsystems: S(v) = NS(v 1 ) [31]. In the Tsallis entropy, this extensivity is set by the parameter q, which can be clearly visualized by the compound rule [31]: with A and B being independent systems, i.e., P(A,B) = P(A)P(B). We can observe superextensivity for q <1, extensivity for q = 1 and subextensivity for q >1. More specifically, S q is always nonnegative for q >0. Although it is also possible to have S q >0 for some q <0, q >0 is generally used to avoid divergences or some inconsistencies [31]. Equation 2 has been largely applied to different physical problems, e.g., http://www.cbpf.br/GrupPesq/Statisti-calPhys/biblio.htm for a large bibliography, leading to good agreements with experimental data. Naturally, despite these applications, it can be asked if the Tsallis entropy is also suitable to code information in a general way such as Shannon [34], Khinchin [35] and Kullback [36] showed to be the Boltzmann entropy. Some papers have been published verifying the mathematical foundation of the Tsallis entropy, similarly to the axiomatic approach used by Khinchin [37,38], as well as investigating its nonaddictive features and their interpretations [6,39]. As in typical physical problems, there are some examples where the Boltzmann-Shannon entropy is not suitable [40]. Besides, it is also possible to define a divergence equivalent to the Kullback-Leibler [41].
By defining ln q (x) ≡ (x 1-q -1)/(1q), Equation 2 can be written in a similar form of the Boltzmann entropy S q = −k w i p q i ln q p i . In this way, a generalized mutual information between v i and v k can be defined as [41]: The generalized mutual information has the necessary properties to be used as a criterion measure for consistent testing [42] and, as Equation 1, it reaches its minimum value when P(v i |v k ) = P(v i ) and the maximum when − v i P(v i |v k ) q ln q P(v i |v k ) vanishes [41], which is equivalent to make − v i ,v k P(v k )P(v i |v k ) q ln q P(v i |v k ) vanish. It is hence possible to look for dependencies between v i and v k by minimizing S q (v i |v k ).
For binary genes, v i {0, 1}, we have S q (v i ) = [P(v i = 1) q + (1 -P(v i = 1)) q -1]/(1q) and the influence of the entropic parameter q can be easily observed. In Figure 6 the maximum entropy for the gene increases as q decreases, taking the limit S max q = 1 as q 0. Indeed, when q ≈ 0, S q (v i ) will be significantly different of S max q for P(v i = 1) ≈ 0 or P(v i = 1) ≈ 1, which means a very rigid criterion in the sense that, either the predictor candidates fulfill all the constraints imposed by the data or they can not be selected as predictors. On the other hand, S max q → 0 for q ≫ 1 which can be interpreted as a very flexible criterion function in the sense that any gene or group of genes can be selected as good predictors.
Another interesting point is the ordering of the entropy with respect to P(v i = 1). If the entropy of P(v i = 1) = a is larger than the entropy of P(v i = 1) = b for a given q*, then it will always be large for any qsee Figure 6. But this ordering is not preserved on the mean conditional entropy. For S q (v i |v k ) the entropy of v i given v k is weighted by the probability of v k , (5) in such way that it is possible to have for some q' ≠ q″ and where the index a represents the constraint This results in a trade-off between the relevance of the conditional entropy and the probability distribution of the predictor genes.
In the context of feature selection or dependence variables test, in which the entropy is used as a criterion function, this non-preservation of the ordering means the existence of an optimal q* by which a system can be best reproduced. As in physical problems, q* should be related to the system properties [31] and discovering the laws or principles which relate q* to these properties becomes fundamental to improve recovering methods.

Proposed Method
The algorithm is based on previous works [8,11], which consists in looking for the group of genes that minimizes the criterion function (i.e., conditional entropy) of the target gene. Therefore, for each given target v i we have to calculate the conditional probabilities P(v i (t+1)| v j (t), ..., v k (t)) based on the data set D T = {s(1), s(2), ..., s (T )}, where s(t) ≡ [v 1 (t), v 2 (t), ..., v N (t)] is the expression vector at time t, i.e., the state of the network at time t. For a network with N genes we have n p = N x=1 N!/x!(N − x)! conditional probabilities to be calculated for each gene, i.e., n p possible groups of predictors. Fortunately, it is not expected that the genes are regulated for many predictors [43,44] and an upper bound for n p can be defined. Kauffman observed that chaotic dynamics are more probable for gene networks with n p ≥ 3 [43,44] and by stability principles he concluded that the average connectivity should be upper bounded by three, once the gene network could be in the frontier of chaos but not chaotic. Herein, we relax a little the Kauffman statement and set this upper bound on the average connectivity 〈k〉 ≤ 5.
Another important point is the possibility of gene networks with different topology classes. In order to verify the sensibility of the method on the topology class, the topology of gene networks were generated with the uniformly-random Erdös-Rényi (ER) [45] and with scalefree Barabási-Albert (BA) [46] models. The BA complex network model is one of the most similar to known real regulatory networks [47,48]. Biological network topologies based on Escherichia coli and Saccharomyces cerevisiae [49] were also considered.
We describe below how the artificial gene networks were generated, the algorithm of inference, evaluation metrics and the experimental results.

The inference algorithm and criterion function
Given the temporal data D T the algorithm fixes a gene target v i and looks for the group of genes g that minimizes the conditional entropy S q (v i (t + 1)|g(t)) for a fixed q. As the network size is generally high, the search space becomes very high such that an exhaustive search is not appropriate. Then, we apply the Sequential Forward Floating Search (SFFS) [50] to circumvent this combinatorial explosion.
For the calculation of the conditional entropy (Equation 5) it is necessary to estimate the conditional probabilities of the target given its predictor candidates as well as the probabilities of these candidates. In the absence of prior information, these probabilities are estimated by the relative frequencies on D T , which means an accuracy dependence on the representativity of D T . Once we are searching for the lower entropy, it is not recommended to set the probability of non-observed instances as null. It is possible that some of the instances are not present in the temporal expression profile because of its small size sample and/or by the dynamics of the system, i.e., the transition functions. Therefore, in order to reach a good trade-off we follow the penalization of non-observed instances [26,51]. The penalized criterion function by adopting the generalized Tsallis entropy is defined as follows: where a ≥ 0 is the penalty weight, m is the number of possible instances of the gene group g (predictors), n is the number of observed instances, d is the total number of samples and r g is the number of each observed instance of g.
If a is set to zero, we do not have any penalization and P(g) is estimated by its relative frequency on D T , i.e., by calculating the terms r g /d g r g = d . When n = m, the penalization term, first term in Equation 6, is canceled and P(g) is now estimated by a modulated relative frequency of the predictors, by adding a to all instances of g, i.e., and finally when n < m, the parameter a is considered mn times for non-observed instances (sum), and n times for observed instances. Thus, in Equation 6 a positive mass of probability is assigned to the nonobserved instances of the gene group g in the expression data, which is parameterized by a.
Furthermore, the penalization of the non-observed instances is weighted by the entropy of the target gene, i.e., S q (v i ). This is important because of the possibility of having a good description about a gene when its uncertainty is small, i.e., the observed instances of the genes are enough to describe the dynamics of a target gene with small entropy. In this paper we set a = 1.
The inference algorithm consists in calculating the mean conditional entropy by using Equation 6 and looking for a group of genes that minimizes it. This search is performed by the SFFS algorithm.

Artificial gene networks
The adopted AGN model was built based on the random Boolean network (RBN) approach [43,44,52]. This model yields insights into the overall behavior of large gene networks, allows the analysis of large data sets in a global way and represents some fundamental characteristics of real GRNs [53][54][55][56][57]. In a RBN model, the state of each gene is a binary variable set by Boolean functions of other genes. The possibility to model GRNs as Boolean networks stems from the switch-like behavior that the cell exhibits during regulation of functional states [52,58]. In this context, the gene state is mapped from a continuous expression to a two-level expression (on/off).
More specifically, an artificial gene network (AGN) is defined by a set V = {v 1 , v 2 , ..., v N } of N genes (nodes), a N × N adjacency matrix C (with C ij {0, 1}) and a set F = {f 1 , f 2 , ..., f N } of N transitions functions. In the Boolean approach, each f i is a logical circuit of the non-null elements of the i th row of C that sets the state of the gene v i . Then, the network state at time t + 1 is a N-dimensional vector s(t + 1) = [v 1 (t + 1), v 2 (t + 1), ..., v N (t + 1)] resulting from the application of these functions to the previous state s(t). Besides, the connectivity of v i is given by k i = N j=1 (C ij + C ji ) and the topology class of the network is defined by the probability distribution of these connectivities.
The networks used in this paper were obtained by the network generator proposed in [21,22]: 1. define a topology class, i.e., the distribution P(k) of the number k of connections per gene; 2. define the k i connectivity for each gene v i setting the predictors (C ji 's) and targets (C ij 's) by using the P(k) distribution; 3. set the transfer function f i for each gene v i by random drawing a truth table according to its number of predictors (n p = N j=1 C ji ), i.e., an output state for each one of the 2 n p input states.
Once defined the AGN, the simulated temporal expression profile D T is obtained by defining an arbitrary initial state of the network and successive applications of the transfer functions.
On the other hand, DREAM4 temporal expression profiles were generated by considering network structures based on Escherichia coli and Saccha-romyces cerevisiae [49]. The dynamics was generated by continuous differential equations with the inclusion of perturbations on the data in order to simulate a physical or chemical intervention. Gaussian noise was also added in order to simulate expression measurement error. In summary, the DREAM4 time series database presents variations of network size with 10 and 100 genes, perturbation and noise on expression profiles generated by differential equations. A detailed description is provided in the DREAM project website [3].
In both cases (AGN and DREAM network), the temporal expression profile D T is submitted to the inference method and its results are evaluated according to the measures presented in the next section.

Evaluation
In order to quantify the similarity between the source gene network A and the inferred network B, we adopted the validation metric based on a confusion matrix [59] (see Table 3).
The networks are represented in terms of their respective adjacency matrices C, such that each connection from gene i to gene j implies C ij = 1, with C ij = 0 otherwise. Then, in order to quantify the quality of the inferred network, the similarity measurements [60] widely used to compare inference methods were adopted, being calculated as follows:   Since the measurements on Equation 7 are not independent of each other, it was adopted the geometrical average between the ratios of correct ones PPV (Positive Predictive Value, also known as accuracy or precision) and correct zeros (Specificity), observing the groundtruth matrix A and the inferred matrix B. In this way, both coincidences and differences are taken into account by these measures, thus implying the maximum similarity to be obtained for values near 1.