Amount of information carried by a message from the phytohormone receptor to a gene effector
In contrast to the classical views of signaling pathways as simple relay systems, biochemical and cell biological experiments indicate that intracellular signaling mechanisms involve dense networks of interacting molecules in which information from the cell environment is processed before it reaches the nucleus . An information theory approach can help us understand how this incoming message from the cell or ER surface is processed and transmitted into the nucleus under intracellular conditions in which numerous proteins interact.
In this paper, we have presented a novel approach to understanding how information is managed in the ethylene signal transduction pathway, which is fundamental for plant responses to environmental cues. In the present case, the transfer of information from the membrane to the nucleus is indirect because the response is based on the inactivation of CTR1 and downstream molecules . In such system, we have been able to address the question of how much information the communication channel can manage. We have achieved this by calculating the probability of ERF1 gene expression for a given amount of ethylene applied to the root cell, and using this result to determine how much information the ethylene-ERF1 system handles at a given time . Our implementation (Eq. 1) let us use the Shannon entropy definition (Eq. 3) to determine the uncertainty associated with the flow of information through this communication channel, from the ER-embedded ethylene receptor to the ERF1 gene in the nucleus. We then used equation (5) to calculate the amount of information that is associated with the activation of ERF1.
According to Figure 1, when the probability of expression of ERF1 is 0, the cell has a minimum Shannon entropy and a maximum amount of information from its environment because the CTR1 module is switched on and the EIN3 module is switched off. The root features dependent on auxin are fully expressed, and the ARF2 gene is expressed. As the ethylene concentration increases, the probability of expression of ERF1 increases, but the amount of information decreases because the fraction of activated EIN3 molecules is insufficient to completely counterbalance the effects of the CTR1 module, and the auxin response is reduced but not eliminated.
= 0.5, half of the auxin-dependent characteristics have been disabled, but the full ethylene response has not been expressed yet. At this point, the system manages the minimum information value and the maximum Shannon entropy or uncertainty value. This situation corresponds to cases in which the system must discern between two possibilities but does not have sufficient information to make a decision. This may correspond to a bifurcation point in the phase space where the system is equally like to take one pathway or another.
For ethylene concentrations above ~1 μL/L,
is greater than 0.5 and the phenotypic characteristics associated with the triple response of etiolated seeds gradually dominant the auxin-dependent characteristics. Over 10 μL/L, the ethylene-dependent communication system manages the maximum amount of incoming information from the external cell environment (~0.5 mers) and exhibits the full response to ethylene.
Figure 2 shows that this behavior of the communication channel leads to a potential-like curve when the sigmoid dose-response graph  is replaced with a dose-information graph. This last curve is symmetric near its minimum value and it becomes extremely asymmetric as the ethylene dose increases or decreases. Thus, as the ethylene concentration increases, the rate of information per unit of ethylene concentration rapidly falls until the minimum is reached and then rapidly increases until a maximum value is attained. At least until 10 μL/L, however, the amount of information that the ethylene-dependent communication channel carries is always less than the information that the channel carries in the absence of ethylene. This may be due to the fact that the effect of ethylene requires the prior inactivation of the ETR and the indirect activation of the ERF1 genetic machinery.
From equation (1
) we have
, and from the definition of the I
)] we get:
Thus, in mathematical terms, the characteristics of the curve in Figure 2 can be written as:
at ET ~0.5 μL/L. Equation (28) indicates that for the first time, we can measure the amount of information that a given hormone carries into a genetic communication channel and that this dependence is non-linear and follows a potential-like curve.
In summary, we have shown that our approach allows us to evaluate, in several different ways, how a cellular communication channel can manage its information flow. First, we explored the amount of information released into the system by different concentrations of an agonist that are received at the ER or cell surface. It is possible that a given concentration of agonist conveys a given message involving a specific amount of information, up to the saturation of the receptor. Second, we explored how much of the total amount of information released by the agonist reaches the nucleus. This amount represents the real capacity of the channel to transmit information from the encoder with fidelity. It is possible that cells use mechanisms such as amplification, redundancy, and splitting of the message to ensure that all of the contents of the message reach the nucleus. We were also able to determine the effector's response to the information in the message transduced from the membrane. The effector should read the correct message in order to induce the correct output. The effect of noise (which is a general term for anything that tends to produce errors in transmission) should be minimized as much as possible in order to avoid mistakes while reading and translating the perceived messages. Thus there should be molecular mechanisms that ensure that the message sent from the receptor is interpreted correctly in the nucleus. Finally, if a message is sent from a surface receptor, there should be a code to translate it into a genetic response. We know how the genetic code is translated into a specific protein. However, we do not know how cells encode information from the activation or inactivation of surface receptors into an appropriate gene expression profile via signal transduction pathways. This encoding mechanism explains how genotypically identical cells behave differently in different environments. In this paper, we propose a novel approach to investigate this.
The possible code used by the ethylene communication channel
If we assume that there are N specific ethylene receptors embedded in the ER membrane, and we denote the maximum activation level of each receptor under steady state conditions by 1 and the inactivated state by 0. Then when the occupancy level of the ethylene receptors is 0%, we have the N-length code
, which corresponds to the outcome
in the probabilistic space for the gene expression. When the ethylene concentration is above 10 μL/L, the level of activation of the receptors is ~0 , so that the code
, with M ≃ N, corresponds to the outcome
in the probabilistic space for the gene expression. In both cases, H
ERF1= 0 as expected.
The fraction f
of inactivated receptors (
) is given by the steady-state solution of the differential equation at a given concentration of ethylene (ET
) (see Additional file 2
is the concentration of ethylene-bound receptors, etr
is the total concentration of ethylene-specific receptors in the ER membrane,
is the dissociation constant of the receptor, and ET is the concentration of free ethylene.
The average k
value used in  is 6 × 10-5
μM = 0.00148 μL/L. The reported k
= 0.036 μL/L  for the ETR1 receptor in transgenic yeast expressing the ETR1 gene. The apparent dissociation constant for the hypocotyl-growth response reported by  is ~0.11 μL/L. According to , the k
values of ETR families 1 and 2 are very similar.
With the more precise value of k
= 0.036 μL/L, R
≈ 0.3 μM with respect to the ER volume , and, if we assume that only receptors of ETR families 1 and 2 are present, then the fraction of inactive receptors in the presence of ~1 μL/L of ethylene is f ≈ 0.07 or 7%. Thus, when
= 0.5, the possible input code consists of 1-f = 0.93N or 93% of active receptors and f = 0.07N or 7% of inactive receptors. Thus, we have the N-length code of the generic form:
. For example, if N = 100 and we assume that the order of the 1's and 0's in the code is important, there are
possible codes compatible with the outcome
in the probabilistic gene expression space. If the order is not important, i.e. the system responds only to the temporal aspect of the signal, we have only one code. In this case, H
ERF1attains its maximum value (see Figure 1).
In this case, when the communication channel responds only to the temporal aspects of the external signal, there can be a one-to-one relationship between the proportion of inactivated receptors (i.e., the intensity of the signal) and the outcome in the probabilistic gene expression space:
As we mentioned before, once the signal has been encoded it has to be transmitted to the nucleus through a noisy channel. This channel consists of the CTR1-MAPK module and its negative effect on the EIN2 molecule. The message carried by the ethylene concentration should be transmitted with fidelity to the nucleus, i.e. the amount of EIN3 activated molecules should be proportional to the intensity of the signal, which is measured by the proportion of inactivated ETRs.
Information flow in response to a sinusoidal hormonal input
The cell's internal noise consists of all the processes that could alter the transmission and content of information of the signal from the agonist receptor to its target through a given signalling pathway. If we assume an internal noise level value of ξ, then the message will be reproduced with fidelity 1-ξ. Another interesting question arises at this point: how does the system ensure the fidelity of the signal in a noisy environment? One possible answer arises from the chemical structure of the communication channel: the particular combination of rate constants and concentration of signaling molecules will have the necessary noise-filtering properties for the communication channel . In a previous paper, we used in silico experiments on the frequency distribution response to show that the filtering properties of the ERF1 communication channel are able to eliminate extremely low and extremely high noise frequencies, which can alter events downstream of the ERF1 gene .
Plants secrete ethylene in a nearly circadian cycle, with the maximum level of ethylene released during the day and the minimum level at night. In , we performed a series of in silico experiments in which we varied the frequency of a sinusoidal input of ethylene to explore how the system responds to periodic rhythms with contrasting frequencies. In this work, we repeated these experiments to learn how the system reads an incoming message from the environment consisting of variations in the frequency of an ethylene input signal (see Figure 3). Thus, while a slower frequency signal is read as an oscillating flow of information (Figure 3a), high frequency inputs are translated into a message with an approximately constant amount of information. Furthermore, there is a window of frequency inputs for which a message from the outside contains the maximum amount of information. Figure 4 shows that this frequency window exhibits a zero information state followed by the maximum information state, coinciding with the natural circadian behavior. Although it is difficult to find a natural phenomena that follows an exact sinusoidal pattern of intensity fluctuations, the in silico experiment shown here suggests that circadian rhythms can transiently cut off the information flow from a particular communication channel (a signaling pathway) while opening the information flow from an alternative communication channel. This switch between two alternative information flow regimes can depend, as we pointed out before, on the structural features of each signaling pathway. In the case analyzed here, the balance between the values of k
for the activation of the ETR1/2 family of receptors can determine the amplitude of the maximum frequency response window of the ethylene-signaling pathway.
In this frequency response window, the gain of the system (G), which is measured by the log10 of the amplitude of the outcome signal (the amplitude of the oscillations in the concentration of ERF1 protein in the nucleus) with respect to the amplitude of the incoming signal (the amplitude of the sinusoidal wave of ethylene) [Figure 5a], tends to -∞ at an angular frequency of ω = 0.005 s-1. In contrast, the value is -5.59 dB at an angular frequency of ω = 0.0005 s-1. This means that the machinery of protein synthesis can effectively reduce the amplitude of the oscillations up to ~60 times while maintaining the frequency of the input signal; in other words, the response is linear under steady-state conditions.
As shown in Figure 5b, a circle in 3D space can represent this peculiar behavior of this signaling pathway. The three axes in this space represent the main features of the communication channel for two different values of ω: the flow of information, the probability of expression of the ERF1 gene, and the amount of ERF1 protein accumulated in the nucleus as a result of ERF1 expression. In this representation, it becomes clear that the system distinguishes between the two oscillation regimes of the incoming signal, thus giving rise to two different forms of the output signal, each with different information.
From Figure 5b, it is also clear that when the oscillating input signal has an angular frequency ~0.0005 s-1, the time between the minimum and the maximum values of the circle can be used to estimate the time needed for the protein synthesis machinery to recover from a ~50% decrease in its activity. The amplitude of the peak is ~1.3 nM and the recovery time is approximately 2.5 h, so that in 9000 s, an expected total of ~423 ERF1 molecules are produced assuming that the nuclear volume is on the order of 540 μm3 . This implies that the rate of protein synthesis is on the order of ~0.047 molecules/s; in other words, each ERF1 molecule is synthesized and returned to the nucleus in ~21.3 s.
During this recovery time, the amount of information increases by ~0.5 mers, which means that each new molecule of ERF1 protein carries 0.0018 mers [0.0026 bits ≈ 2.6 millibits (mb)] of information into the nucleus at a rate of 8.45 × 10-5 mers/s (1.22 × 10-4bits/s ≈ 0.1 mb/s) in the presence of periodic ethylene stimulation with ω = 0.0005 s-1. In this form, the steady linear properties of the communication channel can be used to estimate the amount of information transferred into the nucleus for each new molecule of protein synthesized. In addition, once it becomes possible to measure these rates within single cells, the predictions of the model presented here may be tested experimentally and used to improve the model.
Interaction of the ERF1 gene with downstream genes
From the results section, the event hls1
implies that H
ERF1because I(erf1; hls1) = H
ERF1. This result means that in the case of one dependent gene, the total Shannon entropy in the communication channel is completely determined by the Shannon entropy associated with the expression of the master gene. We can express this statement as a mathematical proposition:
Define the events hls1
as when the gene HLS
1 is in its expressed state due to the expression of the ERF
1 gene, and erf1
as when the master gene ERF
1 is in its expressed state, i.e. hls
Define the events hls1
as when the gene HLS
1 is in its expressed state due to the expression of the ERF
1 gene and arf2
as when the gene ARF
2 is in its expressed state. If these events are such that hls
= ∅ and
The arguments that provide support for these propositions are found in the results section. The propositions put forward here are extremely important for understanding how the ethylene communication channel is built. The hierarchical structure of the channel is revealed when we use a probabilistic description of the genetic expression of the system instead of a deterministic one. By defining the degree of expression of the genes considered in the simulated system as a probability, we introduce a certain degree of uncertainty that can be measured using the Shannon entropy function.
We postulate that the decoder of the information carried by the ethylene concentration is the master gene ERF1 and thus, that the entropy associated with the decoding of environmental information is upper bounded by the value of H for this gene. This information decoding process causes a given number of ERF1 protein molecules to attach to the promoter sites of target genes with a CCG box and thereby trigger the ethylene response.
In this form, Proposition 1 and its corollary state that the uncertainty introduced in the communication channel by translation of the gene HLS1, which is expressed after ERF1, is due entirely to the decoding of the incoming message by ERF1. This proposition also implies that the translation of HLS1 cannot increase the level of uncertainty within the communication channel. In other words, the expression of a "slave" or dependent gene cannot produce a greater degree of uncertainty than the produced by the expression of its master gene when the incoming message created by a given hormone concentration is decoded.
Proposition 2 states that the mutually exclusive expression of the two antagonist genes HLS1 and ARF2 does not produce more entropy than that produced during the expression of either of their master genes. Although these propositions are inspired by limited and preliminary results and are applicable at this point only to the ethylene communication channel, they provide novel guides for studies of other signaling pathways in the future. They suggest that master genes may be responsible for the precise decoding of messages from the cell environment in order to guarantee certain precise responses to a signal even in noisy environments.