The regulation of flowering time network
The state transition to flowering in plants is precisely controlled by environmental conditions and endogenous developmental cues so that plants produce their progeny under favorable conditions. The response to multiple factors suggests the existence of a complex network regulating this state transition in plants. In this study, the biology of flowering time (Photoperiodic) in Arabidopsis thaliana showed that a complex gene regulatory network that controls this transition integrates the responses based on various external and internal conditions. Consequently, the regulation of the flowering time has been a major adaptive trait during plant evolution and domestication. A large number of genes have been characterized as flowering time regulators, and several recent reviews have provided detailed descriptions of flowering time pathways [2].
Arabidopsis is a facultative or quantitative long day plant that can flower, albeit much later in a short day. Key regulatory genes appear to be conserved in Arabidopsis. A short day plant suggests that common pathways are utilized [16, 17]. The plant perceives photoperiod and is transduced to a downstream signaling system. The light-dependent pathway controls the flowering in response to seasonal changes.
Figure 5 shows the gene regulatory pathway for the flowering transition pathway in Arabidopsis[1], which is mediated by CONSTANS (CO):
-
1
CO codes for a zinc finger and CCT domain-containing transcription factor that accumulates under long day conditions in leaves as a result of the combination of the rhythmic expression of CO mRNA.
-
2
CO activates the expression of FLOWERING LOCUST (FT) probably by binding to the FT regulatory regions [18, 19]. The FT protein is a component of the mobile flowering signal that moves upon its expression in the vascular tissue of leaves to the shoot apex [20, 21].
-
3
At the meristem, FT physically interacts with the bZIP transcription factor FD and the FT/FD complex and activates the expression of SOC1 [22].
-
4
SOC1 and AGL24 form a positive feedback loop, and the two factors might form a complex for the up-regulation of LFY. Thus, the regulators of the floral transition form a small network with multiple interactions among themselves,
-
5
AP1 and LFY are ultimately resolved in the up-regulation of the floral meristem identity genes.
We used three different models for the flowering transition pathway in Arabidopsis to reconstruct the experimental data.
Experimental data
We obtained the experimental data from:
-
(1)
Plant Expression Database (http://www.plexdb.org/), ID: AT4;
-
(2)
NCBI GEO, ID: GSE576 and GSE577.
Both contain the microarray data of the eight genes included in our study. Between the two, AT4 discusses the flower development of Arabidopsis thaliana; it was controlled by several signaling pathways which converge on a set of genes such as FT and SOC1 that function as pathway integrators. The photoperiod is regarded the most important factor in promoting floral transition: Arabidopsis thaliana will flower earlier under long day conditions than under short day conditions. It is therefore considered a facultative long day plant. To monitor changes of gene expression during floral transition and early flower development, plants were grown under short day (9 hr light, 15 hr dark) for 30 days and then shifted to long day (16 hr light, 8 hr dark) to induce flowering. The RNA was isolated from micro-dissected apical tissue harvested 0, 3, 5, and 7 days after hybridized to the Affymetrix ATH1 microarray.
We not only analyzed the reopens of Columbia (col) and Landsberg erecta (ler) but also the effect of mutants in the flowering time genes CONSTANS (co) and FT (ft). In this study, we used four experiments for parameter estimation: (1) the Columbia (col) is the most widely-used wild type of Arabidopsis thaliana; (2) The Landsberg erecta (ler) is currently the second most widely-used accession of Arabidopsis thaliana; finally (3) CONSTANS (co) and (4) FT (ft) are mutants in the flowering time [6].
We used four different experimental data sets based on optimized parameter estimation to find the most appropriate model.
Dynamic model
Before introducing the transcription mechanism of the binding process, the following assumptions were made in advance: (a) Transcription is initiated when all the activation binding sites are occupied, and all the repression binding sites regarding the same gene are empty; and (b) The cell size remains constant during the time course of flowering state transition.
S-system
Most biological systems are nonlinear. Although the Michaelis-Menten model has been widely used to approach biological systems, one of the disadvantages lies in the fact that it is not an explicit mathematical form for all cases, especially for complex processes such as diffusion-reaction interactions. The S-system consists of a set of mathematical terms that is sufficient to capture most of the nonlinear phenomena in nature including diffusion-reaction interactions. The development of the S-system [23] has been shown to provide a good approximation for many cases, and there are efficient procedures for estimating the parameter values [24, 25]. Protein-DNA interactions such as the binding between a transcription factor and its target gene have been studied for many years. Experimental observations have promoted the proposition that the diffusion effects in 3-D and 1-D crawled along the DNA are critical in the binding process. Early studies have yielded an unexpected result that the binding rate for the Lac repressor protein to its binding site on DNA is approximately 100-1000 times faster than the maximal 3-D diffusion rate in solution [26]. This phenomenon is called facilitated diffusion [27]. A picture of facilitated diffusion is schematically shown in Figure 6.
The process can be described by the reaction
where TF represents the transcription factor, TF
ns
represents the non-specific absorbed transcription factor on the DNA, and BS represents the binding site. The first step in this reaction is absorption of the transcription factor on the DNA. The second step in this reaction is mRNA production after the absorbed transcription factor has bound to its target gene. The mRNA production rate can be formulated as [28]:
where λ is the average length of the transcription factor that moves along the DNA, L is the total length of the DNA, and τ is the sum of the 3-D diffusion time and the retention time on the DNA for the transcription factor.
The Langmuir isotherm is not suitable for describing protein adsorption, because the diverse conformations and multiple absorbed sites in the absorption process [29]. The Freundlich adsorption isotherm is more concordant with the experimental observations of proteins absorption. Proteins absorption is strongly dependent on the bulk concentration of proteins. In this case, the Freundlich adsorption isotherm of transcription factor on the DNA can be expressed as
where K and n are constants at a particular temperature. From equations (1) and (2), the mRNA production rate can be determined as:
It can be transformed to the mathematic form of the S-system. For this reason, we adopted the S-system function to describe the diffusion-reaction interactions in the mRNA transcription process. In the case of mRNA transcription, the S-system equation for the transcription rate is given by:
(4)
where n is the number of transcription factors, TF
j
is the j-th transcription factor that regulates gene i, α
i
and β
i
denote the positive rate constants, and g
ij
and h
ij
are referred to as the kinetic orders. If g
ij
> 0, gene j will induce the expression of gene i. On the contrary, gene j will inhibit the expression of gene i if g
ij
< 0. The variable h
ij
has the opposite effects in controlling the gene expression compared to g
ij
. In the present study, the range of g
ij
and h
ij
falls between 0 and 2.
Michaelis–Menten model
The Michaelis-Menten model describes a catalysed reaction in which an intermediate complex forms after binding between enzyme and substrate. Thereafter, the complex (TF-BS) converts to the product and enzyme. In the case of mRNA transcription, the transcription process via such mechanism may be represented as:
Under the condition [TF]> > [BS], the production rate of mRNA for a gene with the diffusion effect ignored can be formulated as:
(5)
where V
max
is the maximum production rate of mRNA and K
m
is the Michaelis constant. The delay effect on the mRNA production increases with the stability of the complex state.
Mass-action model
The chemical mechanism of Mass-action model states that the reaction rate is proportional to the product of the active mass of reactants. In the case of mRNA transcription, the reactants correspond to the transcription factor (TF) from the upstream gene and its specific binding site (BS) on the downstream gene. The transcription process in the cell may be represented as:
Because the total number of binding sites for a specific gene is fixed, the production rate of mRNA of the downstream gene with diffusion ignored can be formulated as:
where k is the rate constant and [TF] represents the concentration of the transcription factor. The delay effect on the mRNA production is assumed to be zero.
Parameter estimation
The objective of parameter estimation is to adjust the parameter values of a model via an optimization procedure so that the predictions based on the model can closely express the observation data. Parameter estimation can be performed through global methods and local methods [30]. However, one of the major challenges in modeling large-scale dynamic systems has been the existence of several local minima in the solution space. In this study, parameter estimation was performed by using the software tool “Complex Pathway Simulator” (COPASI Ver. 4.8) to fit the time series experimental data based on a dynamic model [31]. Evolutionary Programming (EP), Hooke & Jeeves (HJ) and Particle Swarm Optimization (PSO) were applied to search for an optimal solution, which may not converge to the minimum with different initial guesses. Among the three, EP [32, 33] was inspired by biological evolution, PSO [34] was inspired by social behavior and the movement dynamics of insects, birds and fishes, and HJ [35] was derived based on a hill climbing technique.
These algorithms possess key advantages in large inverse problems of quantitative mathematical models [36]. The goodness of fitting for each set of estimated parameters can be quantified by the least squared error O:
(7)
where p is the tested parameter set, n and t are the number of genes in the regulatory network and the number of samples in the time series data, respectively; Y
i,j
(p) is the prediction time series data by the dynamic model for the parameter set p; and X
i,j
is the experimental time series data. The weighting factor is given by the mean square: .
Model ranking and selection
In this study, we compared three models for the gene regulatory network of the flowering time regulation in Arabidopsis. We used the mean relative error (MRE) to quantify the response of each model subject to small perturbations [37]:
(8)
where x
i,j
denotes the experimental time series data for the i- th gene at time point j, y
i,j
is the simulation data for the i- th gene given by the model at time point j, n is the total number of genes, and t is the number of samples in the time series data.
Parameter sensitivity analysis
Dynamic models have been widely used to study metabolic networks and gene regulatory networks. These models are used to reconstruct experimental data and predict unobserved behaviors of a biological system. To address the many sources of uncertainty including error and noise in the experimental data, sensitivity analysis may be performed to identify the parameters in a model that have the strongest effect on the overall behavior.
Sensitivity analyses have the primary goal of determining how a given model responds to variations in a parameter. Local sensitivity analysis focuses on a particular point in the parameter space by changing one parameter at a time to obtain a local response of the model. Global sensitivity analysis tries to capture the entire parameter space all at once, allowing multiple parameters to be explored simultaneously [38].
In this study, we used the SBML-SAT software tool for Multi-Parametric Sensitivity analysis (MPSA) [39]. An MPSA analysis of the time dependent normalized sensitivity response is defined as:
(9)
where X
j
is the mRNA concentration of the j- th gene, and p
i
is the i- th parameter in the dynamic model.
Akaike information criterion
We compared three models for flowering time regulation in Arabidopsis. We used several parameter estimation methods to estimate the parameters of the dynamic models.
Parameter Estimation helps us quantify the regulatory abilities of the genes involved at the flowering time. In order to determine whether a dynamic model is optimal, in this study a statistical approach called Akaike Information Criterion (AIC) [40] was employed to validate the number of model parameters and determine the significance of the parameters.
The Akaike Information Criterion (AIC), which attempts to include both the estimated residual variance and the model complexity in one statistic, decreases as the residual variance S
e
decreases, and it increases as the number of parameters p increases. For a gene regulatory model with p regulatory parameters to fit with a dataset of N samples, the Akaike Information Criterion (AIC) can be written as [40]:v
where L is the likelihood of the mathematical model and p is the number of parameters in the model.