When predicting interactions between TFs and their target genes based on gene expression profile, a key assumption is that mRNA expression level is informative in the prediction of protein activity. Although expression levels between mRNAs and their corresponding proteins in different cell types exhibit a range of correlations for different genes [23], an overall positive correlation between mRNA and protein expression levels has been identified [24, 25], therefore, we adopt this strategy in our study.
The NCA requires two inputs: a time series of gene expression profiles and a pre-defined regulatory network. The original gene expression data are obtained from the Arabidopsis Information Resource (TAIR) and Gene Expression Omnibus (GEO) of NCBI. They cover seven A. thaliana pollen developmental stages with 23 profiles in total for wild type Columbia (Col-0): uninucleate microspores (UM), bicellular pollen (BP), tricellular pollen (TP), mature pollen (MP), hydrated pollen grains (HP), 0.5 hours germinated pollen tubes (0.5 hr), and 4 hours germinated pollen tubes (4 hr). Those datasets of pollen developmental stages were generated by three labs [14–16], each of which includes at least one MP sample as control. In order to make comparison between datasets from different labs, the MP sample from that lab is used as the control to process the related dataset, and only the fold change values of each gene from each dataset is kept for the future calculation.
The insufficiency of the availability and comparability of A. thaliana pollen development expression data limit the power of NCA. To overcome the limitation, besides we take the mature pollen expression data as the control from the same experiment, we also collect the pollen development-related transcription factors from the Database of Arabidopsis Transcription Factors (DATF), The Arabidopsis Gene Regulatory Information Server (AGRIS), and the Plant Transcription Factor Database (PlnTFDB).
In NCA, the pre-defined regulatory network initially accounts for the gene expression response. The regulatory relationships between the transcription factors and their target genes can be collected from published literatures and transcriptional factors related databases [26]. From the three databases mentioned above, we collect 2, 283 transcription factors which can be mapped to microarray probes. We also collect 8 interaction pairs between transcription factors specific for A. thaliana pollen development through text-mining. However, the interaction data between transcription factors and their target genes in pollen development is very limited. Therefore, we have not enough prior interactions available for NCA. To overcome this limitation, we use the microarray data to explore the potential regulatory interactions according to the correlation coefficient (r) of each pair of transcription factors and the fold change (FC) of each gene under different conditions. We choose those gene pairs with correlation coefficient |r|>0.9 and the genes with |FC|>1.6. To reduce false positive data, all differentially expressed genes (DEGs) are hierarchically clustered by FC values, and those genes with high correlation are grouped into corresponding clusters. The resulting clusters indicate that all the genes under a cluster can be regulated by the related TF. Taking the correlation coefficient as control strength for NCA, we define a matrix of regulatory relationships between the selected TFs and their target genes, and generate a regulatory network for the pollen development. The regulatory network includes 289 transcription factors, 5530 target genes and 429, 790 regulatory relations. Processed by NCA, we obtain 15 TFs and 101 target genes. Because of the inability of NCA to predict the regulatory pattern of transcription factors, we take the positive correlation between TF and its target genes as positive regulation, and negative correlation as negative regulatory relation. Based on the network and the expression data, we further estimate the activities of the transcription factors in the network over pollen development with NCA and characterize the dynamic regulatory network. NCA decomposes the matrix of gene expression values into two matrixes, one matrix represents the influence of a transcription factor on a target gene and another reflects the activities of transcription factor [9].
Transcription factor activities under different pollen development stages
The activities of 15 TFs clearly show stage-specific actions in pollen and pollen tube development. 12 of them (AT4G17490, AT5G43990, AT5G05410, AT5G04760, AT3G49530, AT5G03510, AT3G63360, AT4G26440, AT3G20670, AT3G24500, AT1G01720, AT1G52520) are activated during pollen development, while the genes for the rest 3 TFs (AT3G63350, AT4G00130, AT3G04100) remain relatively high expression without significant change (Figure 1). AT4G17490 (ATERF6) gene, encoding the ethylene responsive element binding factor 6 [22], belongs to AP2-EREBP gene family and shows its maximum activity at 0.5 hr with a slight decrease at 4 hr in pollen tube development. Previous research has indicated that members in AP2-EREBP gene family play a role in floral organ identity determination [27]. AT5G43990 (SUVR2) gene, its product acting as a histone-lysine N-methyltransferase/zinc ion binding factor [22], is expressed during the fourth anthesis [28], reaching its peak expression at TCP stage and returning to baseline at 4 hr stage during pollen development. SUVR2 is one of SUVR family protein, which can act in concert to achieve various functional H3K9 methylation states that will eventually lead to DNA methylation in a locus-specific manner (Mutskov and Felsenfeld 2004). The up-regulation of SUVR gene in the specific stage of pollen development indicates the involvement of histone remodification in the gene expression switch and regulation rewiring at the epigenetic level during the process. Gene AT5G05410 (DREB2A) is expressed in pollen tube cell, and its activity steadily increased from BCP to HP. DREB2A is an important transcription factor that has been confirmed to involve in heat or water stress-inducible gene expression of A. thaliana. It specifically interacts with cis-acting dehydration-responsive element/C-repeat (DRE/CRT), thus functions in cold and drought stress-responsive gene expression in A. thaliana [29]. The expression pattern of DREB2A gene indicates that some cold and drought stress related biological processes are also involved in the pollen tube cell development and growth. AT5G04760 (MUK11.7) expression is detected in germinated pollen grain and pollen tube cell, and exhibits a sharp increase from MP to HP stage. AT5G03510 (F12E4.290), a C2H2-type zinc finger family protein, changes its gene expression from HP stage. As a member of heat stress transcription factor family, AT3G63350 (HSFA7B) has been shown to be expressed during the fourth anthesis stage [28], and down-regulated at BCP, HP stage and eventually return to its base level. AT3G62260 gene (T17J13.220, encoding a protein phosphatase 2c family protein), which expression has been reported during the fourth anthesis stage as AT3G63350 does [28], is turned on at TCP stage. AT3G49530 (NTL6), auto-stimulated in pollen tube cell development [30], is up-regulated at HP stage. AT4G26440 (ATWRKY34, a member of WRKY transcription factor family), which gene expression has been detected in anther and pollen tube cell [28], is activated at BCP. Its gene expression has been confirmed as pollen specific [31–33]. AT4G00130 (F6N15.6) gene presents a rapidly reduced activity from BCP to HP and a sharp increase from HP to 0.5 hr stage. AT3G20670 (HTA13) gene, which is expressed in pollen tube cell, increases its expression steadily from UNM to HP stage. AT3G24500 (MBF1C) is a key regulator of a coordinated heat stress-response network involving SA-, trehalose- and ethylene-signaling pathways, and its gene is expressed in pollen tube; its expression is steadily increased from BCP and reaches its peak at HP stage. AT3G04100 (ATAF1) belongs to a large family of putative transcriptional activators with NAC domain; its expression is detected in pollen tube cell and deactivated from BCP stage. As the same family as AT3G04100 with NAC domain, AT1G01720 (ATAF1) gene also shows its expression in pollen tube cell, it is steadily up-regulated from BCM and reaches its peak expression at HP stage. ATAF1 has been proposed to modulate plant ABA signaling and high ATAF1 expression has been considered to contribute to ABA hypersensitivity in Arabidopsis [34]. AT1G52520 (FRS6), which potentially acts as positive regulators in phyB signaling pathway controlling flowering time [35], is steadily up-regulated from UNM and reaches its peak expression at 0.5 hr stage.
The correlations between gene expressions for transcription factors and their activities are not identical among all the transcription factors. Five transcription factors (AT4G17490, AT5G03510, AT3G62260, AT1G52520, AT3G20670) present strong positive correlation between their activities and expressions (r > 0.5), when three transcription factors (AT5G43990, AT5G04760, AT4G26440) show strong negative correlation (r < -0.5). However, the rest seven TFs (AT3G63350, AT5G05410, AT3G49530, AT3G24500, AT3G04100, AT1G01720, and AT4G00130) display less consistence or no correlation at all (|r|< 0.5).
Since the linear model of gene expression upon which NCA rests does not reveal the relationships between transcription factors, we search all the transcription factor pairs with high correlation (|r|> 0.5) from the protein-protein interactions catalogued in the A. thaliana Protein Interactome Database [36]. However, no protein-protein interaction has been recorded for any pair of the 15 TFs. Although no experimental data confirms the direct interactions between those TFs, the high correlations between some TFs under different development states suggest their possible relations. Interestingly, the correlation matrix between transcription factor activities reveals that two sets of TFs' activities are apparently positively correlated. One set includes 6 TFs: HSFA7B, AT3G62260, FRS6, ERF6, AT4G00130, and AGL57, another includes WRKY34, AT3G04760, SUVR2. Although no experimental data supports that the TFs in each set form direct interaction, the results inferred from NCA represent an indirect evidence of the interaction or cooperation among them.
Regulatory influence matrix and gene expression clustering
According to the assumptions of NCA, the target gene expression is controlled by an adjusted strength matrix and the transcription factor activities. The assigned quantitative values of the adjusted strength are able to be used to obtain more biologically meaningful clusters than by using target genes' expression. Based on their expressions, the target genes are hierarchically clustered with the adjusted strengths of transcription factors (Figure 2A). In total, eleven major clusters are identified (Additional file 1), which represents the coordinated actions of transcription factors to regulate the gene expression. Cluster 4, 7, 8, and 9 highlight the influence of single TF on a set of genes, whereas cluster 3, 11, 10, and 5 display a set of TFs influence on a set of genes. Interestingly, the regulatory relationships from the clusters can also disclose the auto-regulation of the transcription factors. For example, in the cluster 4, it reveals that the gene AT3G04100 (AGL57), which encodes a MADS-box family protein, is also a target of its own protein, and the same as AT1G01720 in cluster 8, AT1G01720 in cluster 9 and AT4G00130 in cluster 12, as well as AT5G43990, AT5G04760, AT3G63350, AT3G49530 in cluster 3. Those self-regulations are unable to be identified from the coexpression approach. NCA shows certain advantages and the auto-regulation can be inferred from clustering on the matrix of regulation influence.
On the other hand, clustering by regulatory strength can identify new clusters unobtainable by clustering the expression data alone. For example, cluster 9 and 5 could not be distinguished when clustering is applied to the gene expression data alone (Figure 2B). In contrast, those two groups can be separated with clustering on the regulatory strength matrix, and are linked to the regulatory influence of transcription factor DREB2A, HTA13 and NTL6. For the target genes FZR2 and SVR1, they cannot be grouped together with the clustering method on the gene expression data alone (Figure 2B), but they are grouped into cluster 3 based on regulatory strength and supposedly regulated by transcription factors SUVR2, AT5G04760, HSFA7B, AT3G62260, NTL6, HTA13, MBF1C, and FRS6. Furthermore, the clustering of the NCA-processed strength matrix adjusted from the initial connectivity matrix can group genes with different expression patterns (Figure 2A and 2C).
Our results further demonstrate that the estimated transcriptional regulation strengths have certain advantages over the gene coexpression approaches for exploring the regulatory relationships and can provide a new insight to the regulatory relations of between transcription factors and their target genes.
Coexpression analysis of the regulatory gene sets
Each pair of TF and its target gene(s) classified by NCA have a high correlation coefficient (|p|>0.9) based on gene expression. Considering that our identified regulatory relationships between each TF and its target genes are derived only from process of pollen development, we further test the robustness of the coexpression under other conditions, such as tissue, abiotic and light conditions. We explore each pair of the TFs and its target gene(s) inferred from NCA in ATTED [37] which is a database of gene coexpression in Arabidopsis under a wide variety of experiment conditions, and find 65 coexpression pairs (in total 472 identified pairs) with correlation coefficient larger than 0.4 (|r|> 0.4), including 8 TFs and 35 target genes. Almost a quarter (15/65) of these coexpressions are negative. Since the rest 407 TF and target gene pairs display the low correlation under all other experimental conditions but show a high correlation in pollen development process, it is reasonable to state that those pairs could be specific in pollen development. There are 5 clusters with more than one TF in each cluster. We search the coexpression for those TFs in each cluster, and find 9 pairs of TFs to present the relatively significant coexpression (in total 15 TFs; |r| >0.4). Almost all pairs of those coexpressed TFs are positively correlated, except one pair between At5g04760 and At5g43990 in cluster 3 (r = -0.41). In addition, we also search every pair of target genes in each cluster for the coexpression, and find 118 coexpression pairs with 6 highly correlated ones (r>0.8), which implies that the rest 112 pairs of coexpression genes in each cluster could be specific in the related stage of pollen development process.
The regulatory dynamics of pollen development
According to the relationships inferred from NCA, we built an integrated model of A. thaliana pollen development (Figure 3). The final dynamic network integrates the inferred transcription factor activities, the regulatory relationships between TFs and their target genes, clustering on the adjusted strengths, the gene expression profiles, and the text-mining data. The network includes 19 TFs and 101 target genes. Several transcription factors present their specific dynamic expression pattern during the pollen development. For example, the expression of AT5G04760 is not detectable during UNM development stage, while AGL18, OFP1, TSO1 and MYB65 are not expressed during TCP, HP, 0.5 hr and 4 hr development stages. The rest genes present their expression during all of the pollen development processes and display different expression at least ones.
AT5G04760 is found no expression at UNM stage. From UNM to BCP stage, AT5G04760 is activated and interacts with SUVR2 to regulate their downstream gene expression. In contrast, AGL57 is deactivated during the stage switch. By the end of BCP stage, AT5G04760 and AGL57 have already executed their function and affected gene expression, including the genes in clusters 3, 4, 5, 10, and 11. From BCP to TCP stage, all genes show trends of not differently expressed. The pollen in TCP stage is similar to MP stage since the number of DEGs detected in both stages is very small. For transcription factors AGL18, OFP1, TSO1, and MYB65, they are curated to play the roles in pollen development from literature and therefore incorporated in the regulatory network. Those transcription factors show no detectable expression until into the 4 hr stage. Another transcription factor, DREB2A, is dramatically deactivated from the beginning. After TCP stage, DREB2A keeps steadily activated; until HP stage, it begins to restore to their basal level of activity. The temporal model therefore provides a global view of TFs' activation and the regulatory relationships between TFs and their target genes during the pollen development of A. thaliana.
The transcription networks have been proven to be made up of a small set of recurring regulation patterns that are called network motifs, and they serve as basic building blocks of transcription networks. To obtain the regulation pattern during pollen development, we detect network motifs in the network. In total, we retrieve 11 network motifs for motif size 3, 82 motifs with motif size 4, and 778 motifs with motif size 5. Each motif embodies a regulation pattern. And most all of the TFs display different roles in more than one regulation pattern. We detect the network motifs for all pollen development stage and find some interesting TF interactions (Figure 4).
For example, MBF1C, which expresses in pollen tube and enhances the tolerance to various biotic and abiotic stresses [38], displays the pattern of up-regulates AT3G62260 and down-regulates NTL6. AT3G62260 functions as protein serine/threonine phosphatase activity and NTL6 undergoes proteolytic processing. Our result indicates that MBF1C regulates protein serine/threonine phosphorylation and proteolysis in the opposite direction. Since phosphorylation plays an important role in the pollen-stigma interaction [39] and AT3G62260 is upregulated before TCP stage, it can be anticipated that MBF1C promotes the pollen-stigma recognition.
According to the network motif, WRKY34 upregulates other 3 target genes: FER3, RHD2 and GRP4 in the pollen development. FER3 has been reported to protect cells against oxidative damage [40] and RHD2 can lead the formation of reactive oxygen species [41], whereas overexpression of GRP4 can increase plant tolerance to osmotic stress [42].
Therefore, as a gene solely expressed in pollen, WRKY34 potentially promotes the expression of FER3, RHD2 and GRP4, which may function as a module to balance the reactive oxygen species metabolism during the process.