Open Access

Information processing in the transcriptional regulatory network of yeast: Functional robustness

BMC Systems Biology20093:35

DOI: 10.1186/1752-0509-3-35

Received: 03 September 2008

Accepted: 19 March 2009

Published: 19 March 2009

Abstract

Background

Gene networks are considered to represent various aspects of molecular biological systems meaningfully because they naturally provide a systems perspective of molecular interactions. In this respect, the functional understanding of the transcriptional regulatory network is considered as key to elucidate the functional organization of an organism.

Results

In this paper we study the functional robustness of the transcriptional regulatory network of S. cerevisiae. We model the information processing in the network as a first order Markov chain and study the influence of single gene perturbations on the global, asymptotic communication among genes. Modification in the communication is measured by an information theoretic measure allowing to predict genes that are 'fragile' with respect to single gene knockouts. Our results demonstrate that the predicted set of fragile genes contains a statistically significant enrichment of so called essential genes that are experimentally found to be necessary to ensure vital yeast. Further, a structural analysis of the transcriptional regulatory network reveals that there are significant differences between fragile genes, hub genes and genes with a high betweenness centrality value.

Conclusion

Our study does not only demonstrate that a combination of graph theoretical, information theoretical and statistical methods leads to meaningful biological results but also that such methods allow to study information processing in gene networks instead of just their structural properties.

Background

The advent of high-throughput technologies in molecular biology has initiated an avalanche of data that possess considerable challenges to quantitative sciences providing statistical analysis methods [1]. Due to the fundamental insight that biological processes should be studied holistically [24] instead of reductionistically, systems based approaches are of central importance in this respect [5]. For this reason, it is no surprise that network related studies experience an enormous interest starting with the investigation of small-world [6, 7] and scale-free [8, 9] networks in the mid 1990's followed by numerous studies devoted to the analysis of complex network topologies and their properties in general [8, 1014]. It is interesting to note that many apparently different networks have similar properties. Most prominent example is the degree distribution. For example, the World-Wide Web, the Internet or biological networks are found to be scale-free [8, 10, 11, 15, 16] with respect to their degree distribution. In molecular biology, metabolic, transcriptional regulatory, signaling and protein networks have been studied extensively during the last years [4, 1719] to shed light on the functional organization of these complex gene networks [20]. In this context, functional robustness is considered a key player for our understanding regarding the interplay of network structure and network dynamics leading to the emergence of life as omnipresent around us [9, 2123].

For general networks, one of the first studies that has thoroughly investigated structural robustness of systems that can be represented as networks is from ALBERT et al. [24]. ALBERT studied the error and attack tolerance of synthetic as well as real world networks and compared random and scale-free networks, e.g., the World-Wide Web or the Internet. By using purely graph theoretical measures – the diameter of the network and the size of the largest connected component – they found that scale-free networks are much more robust against random errors than random networks but more vulnerable against directed attacks. In the context of gene networks the interest shifts from the structural robustness of the networks to their functional robustness because the ultimate goal is of course to gain insights into the function of a living cell or an organism respectively. On a time scale of a living organism the question of functional robustness has been addressed by [2528]. For example, in [27] the dynamics of Boolean networks [29, 30] were studied serving as a simplified model for the signal processing taking place in gene networks. As major result [27] found that fluctuations occurring inevitably within the system, e.g., due to the inherent noise present on a molecular level [31, 32], can be suppressed by a suitable design of the overall network topology [27]. On an evolutionary time scale the functional robustness of gene networks has been studied by [3335] considering directly the role selective pressure might play during evolution leading to observable patterns of, e.g., protein structures, gene expression or network structures as present in current organisms. In this paper we tie up with previous studies aiming to analyze the functional robustness of networks on a time scale of living organisms. By pointing out the time scale we want to emphasize that we do not investigate the evolutionary robustness of an organism. Instead, the major objective of this paper is to investigate the functional robustness of the transcriptional regulatory network (TRN) of S. cerevisiae with respect to single gene perturbations. As quantitative measure of functional robustness we suggest to use an information theoretic measure [36], previously used to study synthetic networks, that does not focus directly on structural changes of the network topology due to the perturbations but on the alterations of information flow, modeled as Markov Chain [37], within the network as consequence of the structural modifications. The advantage of information theoretic measures [3840] is that the concrete underlying dynamics does not need to be specified precisely, instead, a qualitative model is enough to gain principle insights into common working mechanisms with regard to more elaborate biological models. General entropy measures for quantifying structural information in networks have been developed in [41, 42]. For our study, we use the transcriptional regulatory network of yeast [43, 44] and apply our information theoretic measure to identify genes that are crucial for the functioning of the organism in the sense that disruptions of the transcriptional regulatory network are experienced strongest by these genes. For this reason we call these genes fragile. In this paper, we quantify our results by connecting these to the list of known so called essential genes of yeast [45] to demonstrate that our predictions are biologically meaningful.

Methods

In this section we present the information theoretic measure we use to analyze the transcriptional regulatory network of yeast to study its functional robustness.

Markov chains

We approximate the information flow in the network as a Markov chain. A Markov chain is a Markov process that is discrete in time and space. We define a Markov process by using a given network topology G and the plausible assumption that all possible interactions are equal likely. Plausible in this context does not necessarily mean that this corresponds best to the real situations, it means that it is the most simple and unbiased assumption one can make. For simplicity, we further assume the Markov process to be of first-order

T(Xt+1= j|X t = i t , ..X1 = i1) = T(Xt+1= j|X t = i t ).

That means, the transition probability T depends only on the last state and not on states that are further in the past.

Definition 1: The transition probability T for a Markov chain of first-order for a network G with adjaceny matrix A is defined by
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ1_HTML.gif
(2)

for all i, j V.

Here k i = ∑ j A ji is the degree of node (gene) i in the network and A ij is a component of the adjacency matrix indicating if node i is connected with node j (A ij = 1) or unconnected (A ij = 0). V denotes a set comprising all genes.

Single gene perturbations

In this paper we study the effect of single gene perturbations on the information processing in the transcriptional regulatory network of yeast. Formally, we define perturbations in the following way.

Definition 2: (Single gene perturbations) If a gene k in network G is perturbed than all outgoing and incoming edges from this gene are deleted. In addition, one self-connection is introduced.

In Fig. 1 and 2 we visualize a single gene perturbation. One can see that the perturbed gene (shown in red) does no longer participate in the information processing in the network. However, the remainder of the network is still structurally intact and capable to transmit signals. Hence, a single knockout can be considered as a perturbation and the modified communication among the remaining genes can be studied principally provided there is a measure to quantify these alterations. This measure is given in the next subsection.
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig1_HTML.jpg
Figure 1

A graph depicts the flow of information in the network.

https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig2_HTML.jpg
Figure 2

Perturbing the node shown in red leads to a breakdown of communication between the red node and all other nodes. However, there is still an information flow in the remaining network.

Asymptotic Communication

The information theoretic measure we use to capture the asymptotic behavior of information processing evaluates the deviation of the unperturbed (or normal (n)) state from the perturbed (p) state caused by the perturbation of gene k. We use the the relative entropy also known as Kullback-Leibler (KL) divergence D [46, 47] to quantify this deviation. Our asymptotic measure is given by the following definition.

Definition 3: (Asymptotic information change)
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ2_HTML.gif
(3)
Here https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq1_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq2_HTML.gif are stationary distributions obtained by
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ3_HTML.gif
(4)
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ4_HTML.gif
(5)
The Markov chain given by T k corresponds to the process obtained by perturbing gene k in the network. The stationary distribution https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq1_HTML.gif of the perturbed (p) network obtained by perturbing gene k and starting from the initial distribution,
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ5_HTML.gif
(6)

depends on i because we use the Kronecker delta, which is one for i = m and zero otherwise, as initial condition. The reason therefore is we consider i as starting point for the spread of information in the network. The interpretation for the unperturbed (normal (n)) distribution https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq2_HTML.gif is correspondingly. We want to note that due to the directedness of the network the Markov process is not ergodic which results in a dependence of the asymptotic distributions https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq1_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq2_HTML.gif on the initial distribution https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq3_HTML.gif . For this reason it is important to use |V| - 1 (starting from k is excluded because the perturbed gene has no longer outgoing edges) different initial distributions https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq3_HTML.gif to evaluate D ik . That means Eq. 3 defines the components of a matrix and the interpretation of D ik is that the index k correspond to the deletion of gene k and index i referes to the initial distribution https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq3_HTML.gif = δ i, m . The diagonal elements D ik (i = k) are not defined.

Results and discussion

Data

For our analysis we use the transcriptional regulatory network of yeast [43, 44] which is a directed, unweighted network. From this network we extract the weakly connected component consisting of 3357 genes and 7230 interactions. The weakly connected component of a network is defined as the subnetwork that connects every pair of nodes by at least one directed path. That means for every pair of genes the weakly connected component ensures that communication (at least in one direction) between these genes is in principle enabled. This is an important characteristic because in our analysis we are aiming to quantify modifications of the communication among genes due to perturbations. Hence, if there would be no path between genes such an analysis would not be sensible.

On a practical note, we want to remark that our theoretical analysis described in detail in the next section is computationally expensive because we perform single gene perturbations for all genes in the network. That means, we do not just analyze one network with our method but as many as genes in the network. Hence, the results presented in this article are obtained by analyzing 3357 networks. It is clear that this is getting more and more demanding computationally by increasing the number of genes in the network. From our simulations we found that networks with several thousand nodes can be studied within reasonable time whereas larger networks would require more algorithmic attention to reduce the computation time.

Results

Now we study the asymptotic behavior of the transcriptional regulatory network of yeast regarding information propagation under the influence of single gene perturbation.

For the normal (unperturbed) and perturbed network topology of the transcriptional regulatory network we determine Markov chains from which we calculate the stationary distributions. The perturbations correspond to single gene perturbations and the Markov chains are obtained as described in the methods section. From the resulting stationary distributions of the Markov chains we calculate the Kullback-Leibler divergence D ik = D ( https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq1_HTML.gif || https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq2_HTML.gif ) for all genes i V and perturbations k V with ik. We want to note that due to the directedness of the network the resulting Markov process is no longer ergodic. Hence, information sent from different genes can results in different stationary distributions. For this reason, we use all N genes consecutively as sender gene. This is reflected by the index i in Eq. 3 corresponding to the gene from which the information was sent initially. On a mathematical note we want to remark that the network does not need to be disconnected to result in a non-ergodic Markov chain. However, the need to consider different initial conditions to study the behavior of the resulting stationary distributions meaningfully remains also true in this case.

We begin our analysis by investigating if the asymptotic results summarized by D ik can be connected to local, structural properties of the genes in the network. For this reason we determine all genes with
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ6_HTML.gif
(7)

and calculate the correlation with the in- and out-degree vector of the network. More precisely, we calculate Spearman's rank-order correlation coefficient [48] between the rank ordered vectors to decide if the order in these vectors is statistically preserved. For the in-degrees we obtain a correlation of r = -0.39 and p = 6 × 10-9, for the out-degrees r = 0.33 and p = 1 × 10-6. Using a significance level of α = 0.05 indicates that both rank correlations are statistically significant implying that, e.g., high out-degrees correspond to high values of D i . These results seem plausible considering the following situation: For a given gene that is connected to all other genes (outgoing edges) it is clear, that an arbitrary knockout of a single gene effects with probability one an outgoing edge of this gene. Hence, this knockout will have an influence on the information processing of this gene. The strength of this influence can not be easily predicted given just this information, however, we will have an influence with probability one. Instead, a gene having very few outgoing connections has a lower probability that a single knockout effects one of its outgoing edges (Pr = k out /N p with N p the number of genes that can be perturbed). However, it is possible that the knocked out gene destroys some communication paths (secondary- or even higher-order effect if measured as Dijkstra distance [49]) and, hence, can still have a strong impact on the information processing. It seems to be reasonable to assume that the further away the knockout gene is from the starting gene (in Dijkstra distance [49]) the less the impact will be. This is a strong indicator that information processing on a systems level depends crucially on the information processing in a local environment of the gene that sends the information. We want to remark that in our analysis the number D i , given in Eq. 7, is a global measure, whereas the degree vector is a local measure. This result is interesting because it demonstrates that the local properties of genes, given by their local connectedness, which can be roughly summarized by their degrees, are not averaged away with respect to the stationary distribution of the Markov process. That means the local connectivity signature is still detectable in the asymptotic behavior. We will come back to this point in the discussions section because this is a non-trivial point.

In Fig. 3 we show the components of the asymptotic information change D ik for which D i ≥ 0.1 holds (149 genes). Blue corresponds to low values (zero) and cyan to high values of D ik (the maximal value of D ik is 20.47). The vertical stripes indicate that the knockout of a few genes effects many other genes whereas most knockouts have only a minor effect on other genes. This is also the reason why we do not show D ik for N = 3357 genes because in this case the figure would appear essentially blue.
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig3_HTML.jpg
Figure 3

Asymptotic information change D ik for all genes with D i ≥ 0.1.

In Fig. 4 we show the histogram of D i . From this figure one can see that the distribution of D i has a heavy tail and that most values are around zero. This indicates that our measure has the desirable property to be very selective by evaluating most perturbations as minor. This corresponds to experimental results showing that only about 10% of all genes in yeast are categorized as essential [50] which means that their knockout has a catastrophic influence on the organism.
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig4_HTML.jpg
Figure 4

Histogram of D i for all genes with D i > 0.

To point out the properties of our measure we show in Fig. 5 results connecting genes with high D i values quantitatively to the appearance of essential genes from gene-deletion experiments in yeast [45]. Figure 5 shows N e /N c in dependence on Θ D . N c is the number of genes found for which D i > Θ D holds and N e is the number of essential genes found in this set,
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig5_HTML.jpg
Figure 5

N e / N c in dependence on Θ D .

S c = {i|D i > Θ D },

N c = #S c ,

S e = {i|i S c and i is essential},

N e = #S e .

The highest values found by this comparison are over 40%. A natural question arising now is if this occurred just by chance or is this high coverage unlikely to happened accidentally. Figure 6 provides information regarding this question. There we show p D in dependence on Θ D . The probability p D is the sum of a hypergeometric distribution p(k; N, N E , n) giving the probability to observe exactly k essential genes in a set of size n = N c D ) when the total number of genes is N containing N E essential genes.
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig6_HTML.jpg
Figure 6

p D in dependence on Θ D . The dotted line corresponds to 0.05.

Hence,
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ7_HTML.gif
(12)

is the probability to observe k or more essential genes in the set N c . From Fig. 6 one can see that for Θ D [8, 15] the probability p D < 0.05 (the dotted line corresponds to 0.05). This result suggests that the peaks observed in Fig. 5 do not appear by chance. Further, we obtain possible cut-off values for our gene set to be considered which corresponds to the interval Θ D [8, 15]. From these results we decided to choose https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq4_HTML.gif = 12 as cut-off value because for this value N e /N c assumes a maximum value. We call all genes for which D i > https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq4_HTML.gif holds fragile genes.

In table 1 we show a list of 12 genes found by setting https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq4_HTML.gif = 12 for which p D = 0.0039. These genes are ordered according to their out-degree d out in descending order. The first column gives the name of the gene. The second and third, the out- correspondingly in-degree of the gene in the TRN. The forth column gives the value of D i and the fifth column indicates if the gene is found to be essential (Y) or non-essential (N) according to [45]. As one can see, the first gene (YNL216W) is a hub because d out = 240. However, all other genes are not. Interestingly, YNL216W is not an essential gene according to [45]. The results in table 1 demonstrate that our measure does not prefer to select hub genes because only one hub was selected.
Table 1

Genes obtained for https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_IEq4_HTML.gif = 12.

gene

d out

d in

D i

essential

YNL216W

240

2

17.5

N

YKL043W

92

1

17.9

N

YML007W

89

2

13.0

N

YFR034C

73

2

14.0

N

YER040W

44

1

17.8

Y

YBR112C

26

1

13.6

Y

YPL177C

23

2

13.3

N

YOL148C

13

1

20.4

N

YGL207W

12

2

18.1

N

YLR399C

9

3

12.2

Y

YDR138W

5

1

15.6

Y

YPR072W

4

1

16.1

Y

The first column gives the name of the gene.

The second and third, the out-correspondingly in-degree of the gene in the TRN. The fourth column gives the value of D i and the fifth column indicates if the gene is found to be essential (Y) or non-essential (N) according to [45]. For this set p D = 0.0039.

This underlines the non-trivial characteristics of our measure. For reasons of completeness we show in table 2 the top four knockout genes that cause the largest influence, as measured by
Table 2

Top four knockout genes that have the largest impact on other genes.

gene

d out

d in

D k

essential

YML027W

314

2

26.8

N

YGL096W

248

0

97.4

N

YDL056W

129

0

149.8

Y

YHR206W

128

0

27.8

N

First column: gene name. Second column: out-degree. Third column: in-degree. Fourth column: D k . Fifth column: essential genes (yes (Y) or no (N)).

https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Equ8_HTML.gif
(13)

on other genes. The genes are again ranked according to their out-degrees. All of these genes are hubs. Considering the top 50 genes reveals that in this set 20 genes have an out-degree below 25 and even genes with an out-degree one and two are among these. Again, this demonstrates that hubness is no sufficient property to characterize these genes.

Next, we analyze the structure of the TRN containing our 12 genes shown in table 1. We find that each gene pair is connected (both ways) via a directed path. This implies that the subgraph formed from these 12 genes is part of the strongly connected component of the TRN. As side note we remark that for Θ D ≤ 9 the resulting set of genes is no longer strongly connected and that also for Θ D = 10 this gene set does not correspond to the entire strongly connected component of the whole TRN comprising N sc = 36 genes. Analysis of the strongly connected component of the whole TRN shows that it contains only 8 essential genes. From this we calculate the probability to find 8 or more essential genes by chance in the strongly connected component. By summing up the probabilities from a hypergeometric distribution we find p sc = 0.021. This shows that essential genes are enriched in the strongly connected component, however, due to p D D = 12) <<p sc the strongly connected component represents a less favorable set to identify essential genes than the set found by our method. Fig. 7 shows D ik for the strongly connected component. In contrast to Fig. 3 the influence of the perturbations is now much more severe as can be seen by the many non-blue dots. As a remark we want to remind that the diagonal of D ik is not defined as explained in the methods section.
https://static-content.springer.com/image/art%3A10.1186%2F1752-0509-3-35/MediaObjects/12918_2008_Article_303_Fig7_HTML.jpg
Figure 7

Asymptotic information change D ik for all genes in the strongly connected component of the transcriptional regulatory network.

Finally, we test for the transcriptional regulatory network if our measure and betweenness centrality are similar by calculating Spearman's rank sum correlation coefficient. For the genes in table 1 we obtain a correlation coefficient r = 0.0139 and a p-value of 0.965 indicating that the results of both measures are not correlated. Further, we find that among the top 100 ranked genes of both measures only two genes are selected by both measures.

Discussion

In table 3 we provide some information about the biological processes the genes in table 1 are involved in. All genes found by our measure belong to the category 'regulation of transcription, DNA-dependent'. Further, some additional categories are listed for each gene in table 3. It is apparent that involvement in transcription regulation is the dominating category. This is interesting because the genes listed in the bottom of the table have a fairly low out-degree (see table 1). From our results we hypothesize that the genes found by our method, who have not been declared 'essential' by [45], should be 'fragile' in the sense that they are influenceable quite severely by the malfunctioning of many other genes. Here, 'fragile' should not necessarily be equalized with 'essential' but the organism may be viable, however, it's overall function substantially impaired. It is also important to bear in mind that the list of essential genes used in our analysis is not necessarily complete. After intense literature research we found that YGL207W (also known as SPT16 – subunit of the heterodimeric FACT complex (Spt16p-Pob3p), facilitates RNA Polymerase II transcription elongation) is reported to be essential by [51] confirming our findings.
Table 3

Biological processes the genes provided in table 1 participate.

gene

biological process

YNL216W

chromatin silencing at telomere, ribosome biogenesis

YKL043W

pos. regulation of trans. from RNA polymerase II promoter

YML007W

regulation of trans. from RNA polymerase II promoter in response to oxidative stress

YFR034C

response to starvation,

YER040W

positive regulation of transcription

YBR112C

negative regulation of transcription, chromatin remodeling

YPL177C

cellular copper ion homeostasis

YOL148C

chromatin modification, mitotic cell cycle

YGL207W

DNA repair, chromatin modification

YLR399C

DNA repair, chromatin remodeling

YDR138W

response to DNA damage stimulus

YPR072W

protein ubiquitination

In addition to the listed biological processes, all genes are involved in the category 'regulation of transcription, DNA-dependent'.

On a mathematical note we want to remark that the fact that rank(D i ) and rank(d out ) respectively rank(d in ) are correlated, as shown in the beginning of the results section, does not imply that our measure approximates or is even identical to the ranking of the degrees. This can be seen in table 1 because, e.g., the five bottom genes have d out < 20, however, in the whole transcriptional regulatory network are 79 genes that have an out-degree larger than 20. But only seven of them appear in the list.

From a perspective of information processing the connection between asymptotic information change and local network structure represented by their degrees is interesting because it indicates that a local subgraph may be sufficient to study information processing in the overall network. This dissection is interesting because it would allow to reduce the computational complexity considerably that arises studying genomes like yeast or even organisms with more genes. In a former study [27], a similar idea has been proposed in a different methodological framework.

Finally, we want to remark that we repeated the analysis using D k = ∑ i D ik as fragility measure of genes. However, for D k we did not obtain meaningful results regarding the enrichment of essential genes. That means that the information captured by D ik is asymmetric, as one would expect from it's construction.

Conclusion

In this paper we analyzed the influence that single gene perturbations have on the asymptotic communication abilities of the transcriptional regulatory network of yeast [43, 44] to learn about the functional robustness of this network. To study the influence of the perturbations we used an information theoretic measure [36] and approximated the information propagation as a first order Markov chain directly defined for a given network topology. Our numerical studies obtained three major results. First, the asymptotic distributions for the perturbed and unperturbed network states carry implicitly information about their local origin from which the initial signal was transmitted. This confirms results previously found for synthetic networks [36]. Second, using our measure of asymptotic information change we could demonstrate that the predicted set of fragile genes contains a statistically significant enrichment of so called essential genes that are experimentally found to be necessary to ensure vital yeast. Third, a structural analysis of the transcriptional regulatory network revealed that there are significant differences between fragile genes, hub genes and genes with a high betweenness centrality value.

In addition to these findings we consider it to be important to emphasize that we employed graph theoretical, information theoretical as well as statistical methods [52] because the biological information processing in gene networks is unlikely to be treated correctly in a deterministic framework. This demonstrates the power of interdisciplinary approaches and is at the heart of computational systems biology.

Declarations

Acknowledgements

We would like to thank Michael Drmota for fruitful discussions.

Authors’ Affiliations

(1)
Computational Biology and Machine Learning, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast
(2)
Center for Mathematics, Probability and Statistics, University of Coimbra

References

  1. Emmert-Streib F, Dehmer M, : Analysis of Microarray Data: A Network Based Approach. 2008, Wiley VCHGoogle Scholar
  2. Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2006, Chapman & Hall/CRCGoogle Scholar
  3. Kitano H: Foundations of Systems Biology. 2001, MIT PressGoogle Scholar
  4. Palsson BO: Systems Biology. 2006, Cambridge University PressView ArticleGoogle Scholar
  5. von Bertalanffy L: General System Theory: Foundation, Development, Application. 1968, New York, George BrazillerGoogle Scholar
  6. Watts D, Strogatz S: Collective dynamics of 'small-world' networks. Nature. 1998, 393: 440-442. 10.1038/30918View ArticlePubMedGoogle Scholar
  7. Watts D: Small Worlds: The Dynamics of Networks between Order and Randomness. 1999, Princeton University PressGoogle Scholar
  8. Albert R, Jeong H, Barabasi AL: Diameter of the world wide web. Nature. 1999, 401: 130-131. 10.1038/43601.View ArticleGoogle Scholar
  9. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature. 2000, 407: 651-654. 10.1038/35036627View ArticlePubMedGoogle Scholar
  10. Albert R, Barabasi A: Statistical Mechanics of Complex Networks. Rev of Modern Physics. 2002, 74: 47-97. 10.1103/RevModPhys.74.47.View ArticleGoogle Scholar
  11. Newman MEJ: The Structure and Function of Complex Networks. SIAM Review. 2003, 45: 167-256. 10.1137/S003614450342480.View ArticleGoogle Scholar
  12. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical Organization of Modularity in Metabolic Networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374View ArticlePubMedGoogle Scholar
  13. Schwikowski B, Uetz P, Fields S: A network of protein-protein interactions in yeast. Nat Biotechnol. 2000, 18: 1257-1261. 10.1038/82360View ArticlePubMedGoogle Scholar
  14. Soinov LA, Krestyaninova MA, Brazma A: Towards reconstruction of gene networks from expression data by supervised learning. Genome Biology. 2003, 4: R6- 10.1186/gb-2003-4-1-r6PubMed CentralView ArticlePubMedGoogle Scholar
  15. Bornholdt S, Schuster H, : Handbook of Graphs and Networks: From the Genome to the Internet. 2003, Wiley-VCHGoogle Scholar
  16. van Noort V, Snel B, Huymen MA: The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO reports. 2004, 5 (3): 280-284. 10.1038/sj.embor.7400090PubMed CentralView ArticlePubMedGoogle Scholar
  17. Friedman N: Inferring cellular networks using probabilistic graphical models. Science. 2004, 303 (5659): 799-805. 10.1126/science.1094068View ArticlePubMedGoogle Scholar
  18. Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006, 7: S7- 10.1186/1471-2105-7-S1-S7PubMed CentralView ArticlePubMedGoogle Scholar
  19. Werhli A, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks. Bioinformatics. 2006, 22 (20): 2523-31. 10.1093/bioinformatics/btl391View ArticlePubMedGoogle Scholar
  20. Barabasi AL, Oltvai ZN: Network Biology: Understanding the Cell's Functional Organization. Nature Reviews. 2004, 5: 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
  21. Kitano H: Biological robustness. Nat Rev Genet. 2004, 5: 826-37. 10.1038/nrg1471View ArticlePubMedGoogle Scholar
  22. Stelling J, Sauer U, Szallasi Z, Doyle F, Doyle J: Robustness of Cellular Functions. Cell. 2004, 118: 675-685. 10.1016/j.cell.2004.09.008View ArticlePubMedGoogle Scholar
  23. Kitano H: Towards a theory of biological robustness. Mol Syst Biol. 2007, 3: 137- 10.1038/msb4100179PubMed CentralView ArticlePubMedGoogle Scholar
  24. Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature. 2000, 406: 378-482. 10.1038/35019019View ArticlePubMedGoogle Scholar
  25. Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA. 2004, 101 (14): 4781-6. 10.1073/pnas.0305937101PubMed CentralView ArticlePubMedGoogle Scholar
  26. Ingolia N: Topology and robustness in the Drosophila segment polarity network. PLoS Biol. 2004, 2 (6): e123- 10.1371/journal.pbio.0020123PubMed CentralView ArticlePubMedGoogle Scholar
  27. Klemm K, Bornholdt S: Topology of biological networks and reliability of information processing. PNAS. 2005, 102 (51): 18414-18419. 10.1073/pnas.0509132102PubMed CentralView ArticlePubMedGoogle Scholar
  28. Wilhelm T, Behre J, Schuster S: Analysis of structural robustness of metabolic networks. Syst Biol. 2004, 1: 114-120. 10.1049/sb:20045004.View ArticleGoogle Scholar
  29. Kauffman S: Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology. 1969, 22: 437-467. 10.1016/0022-5193(69)90015-0View ArticlePubMedGoogle Scholar
  30. Kauffman S: Origins of Order: Self-Organization and Selection in Evolution. 1993, Oxford University PressGoogle Scholar
  31. McAdams HH, Arkin A: Stochastic Mechanisms in Gene Expression. Proc Natl Acad Sci USA. 1997, 94: 814-819. 10.1073/pnas.94.3.814PubMed CentralView ArticlePubMedGoogle Scholar
  32. Tu Y, Stolovitzky G, Klein U: Quantitative noise analysis for gene expression microarray experiments. Proc Natl Acad Sci USA. 2002, 99: 14031-6. 10.1073/pnas.222164199PubMed CentralView ArticlePubMedGoogle Scholar
  33. Wagner A: Robustness against mutations in genetic networks of yeast. Nature Genetics. 2000, 24: 355-61. 10.1038/74174View ArticlePubMedGoogle Scholar
  34. Wagner A: Robustness, Neutrality, and Evolvability. FEBS Letters. 2005, 579: 1772-1778. 10.1016/j.febslet.2005.01.063View ArticlePubMedGoogle Scholar
  35. Wagner A: Robustness and Evolvability in Living Systems. 2007, Princeton University PressGoogle Scholar
  36. Emmert-Streib F, Dehmer M: Fault Tolerance of Information Processing in Gene Networks. Physica A. 2009, 388: 541-548. 10.1016/j.physa.2008.10.032.View ArticleGoogle Scholar
  37. Norris J: Markov Chains. 1998, Cambridge University PressGoogle Scholar
  38. Gallager R: Information Theory and Reliable Communication. 1968, Wiley, New YorkGoogle Scholar
  39. MacKay D: Information theory, inference and learning algorithms. 2003, Cambridge University PressGoogle Scholar
  40. Shannon C, Weaver W: The Mathematical Theory of Communication. 1949, University of Illinois PressGoogle Scholar
  41. Dehmer M, Emmert-Streib F: The Structural Information Content of Chemical Networks. Zeitschrift für Naturforschung A. 2008, 63a: 155-158.Google Scholar
  42. Dehmer M, Borgert S, Emmert-Streib F: Entropy Bounds for Hierarchical Molecular Networks. PLoS ONE. 2008, 3 (8): e3079- 10.1371/journal.pone.0003079PubMed CentralView ArticlePubMedGoogle Scholar
  43. Luscombe N, Badu M, Yu H, Snyder M, Teichmann S, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431: 308-312. 10.1038/nature02782View ArticlePubMedGoogle Scholar
  44. Yu H, Kim P, Sprecher E, Trifonov V, Gerstein M: The importance of bottlenecks in protein networks: Correlation with gene essentiality and expression dynamics. PLoS Computational Biology. 2007, 3 (4): e59- 10.1371/journal.pcbi.0030059PubMed CentralView ArticlePubMedGoogle Scholar
  45. Giaever G, et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935View ArticlePubMedGoogle Scholar
  46. Kullback S, Leibler R: On information and sufficiency. Ann Math Stat. 1951, 22: 79-86. 10.1214/aoms/1177729694.View ArticleGoogle Scholar
  47. Kullback S: Information theory and statistics. 1959, Wiley, New YorkGoogle Scholar
  48. Sheskin DJ: Handbook of Parametric and Nonparametric Statistical Procedures. 2004, RC Press, Boca Raton, FL, 3Google Scholar
  49. Dijkstra E: A note on two problems in connection with graphs. Numerische Math. 1959, 1: 269-271. 10.1007/BF01386390.View ArticleGoogle Scholar
  50. Deutscher D, Meilijson I, Schuster S, Ruppin E: Can single knockouts accurately single out gene functions. BMC Systems Biology. 2008, 2: 50- 10.1186/1752-0509-2-50PubMed CentralView ArticlePubMedGoogle Scholar
  51. Orphanides G, et al.: The chromatin-specific transcription elongation factor FACT comprises human SPT16 and SSRP1 proteins. Nature. 1999, 400: 284-288. 10.1038/22350View ArticlePubMedGoogle Scholar
  52. Emmert-Streib F, Dehmer M, : Information Theory and Statistical Learning. 2008, Springer, New YorkGoogle Scholar

Copyright

© Emmert-Streib and Dehmer; licensee BioMed Central Ltd. 2009

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.