There has been an upsurge in publications attempting to uncover genome-wide transcriptional control structures using machine learning strategies. We believe approaches such as the one presented here have two-fold use: i) They allow researchers with interest in particular genes or regulators to find in silico support for their hypotheses. ii) They demonstrate genome-wide properties of the transcriptional network. We were able to find known interactions in the data, and we expect predicted interactions to be a valuable resource for experimentalists when designing experiments.
The basic idea in the present work is to identify genes whose responses over time due to treatment are similar. However, such similarity will naturally depend on the measure of similarity used. As demonstrated here, it is possible to focus on a particular process of the system, like the cell cycle in budding yeast, by choosing a measure of similarity which divides the genes into classes that are known to be related to the process of interest. The cell cycle has some special characteristics that makes defining such a measure easy. However, one may very well define such measures with natural semantics for other biological settings. For instance, in an infection study one may be interested in finding regulatory descriptors corresponding to different stages of the infection. By designing a detector that specifically identifies genes active at different stages we would expect more relevant classes to be found than by relying on clustering. Of course, designing such detectors can become quite cumbersome. An interesting direction of future research would be to use hidden Markov models for dividing the genes into different groups. Such models allow incorporation of prior knowledge about the dynamics of the process and has been successfully applied to gene expression data .
One notable exception to the use of clustering analysis is Tsai et al.  where transcription factors were defined as cell cycle-related if the genes they were regulating had a significantly different expression level in at least one of five phases of the cell cycle compared to one or more of the other phases. Although this is an interesting approach, we were fascinated by the large discrepancies between sets of genes detected as periodic in S. cerevisiae for different synchronization methods . Our work lends support to the notion that periodic expression is conditioned on different stimuli. Using descriptors of possible cis-regulatory elements, we were able to find regulatory combinations specific to different sets of synchronization methods. It should be noted that the rationale for using different synchronizations in the experiments by Spellman et al.  was to be able to identify a signal that is assumed to be common across the experiments corresponding to the physiological pattern of expression in the cell cycle. In this study we instead acknowledged that different synchronization methods may correspond to different environments (internal and/or external) in which the cells propagate, and sought to explain why genes have different expression patterns with respect to periodicity under these conditions. The fact that it is possible to find combinations of transcription factors and sequence motifs that are significantly more common among genes within our classes supports this hypothesis. However, the expression data was only available for the two first periods of the S. cerevisiae cell cycle, thus we cannot claim that the differences in periodic behavior are long-lived. In fact, it is intuitively more appealing to regard these differences as temporary in the sense that the effects of different initializations of the cell cycle (i.e. synchronization methods) will die out in the long run.
The fact that we group genes according to periodicity in three different synchronization experiments, and not according to conventional expression similarity, means that we cannot expect all genes within a periodic class to be regulated by the same mechanism. Indeed, what we see is many different mechanisms describing different subsets of genes within each periodic class. In principle, we could have subdivided our periodic classes into cleaner regulatory modules based on, for example, the time of peak expression. However, with several of the periodic classes already containing very few genes, a more practical approach was to let the rule method arrive at this subdivision automatically based on the available cis-regulatory information and the periodic classes.
The periodic classes are inferred exclusively via computational analysis of expression data, and no biological experimental validation has been performed. The class division will depend on the specific thresholds of detection. The results presented here are based on a scheme known as "classification with rejection" where genes for which neither outcome is supported are rejected from further analysis. We also attempted a class division with a single sided criterion (using only criterion B as introduced earlier), classifying genes as periodic if s > 0.95 and non-periodic otherwise. Using this single criterion for classification we found the Gene Ontology term "regulation of cyclin-dependent protein kinase activity" (GO:0000079) to be overrepresented in class 011, the set of genes detected as periodically expressed only in the cdc-experiments. This was encouraging since the cdc-based synchronizations act by interfering with different cyclin dependent protein kinases . However, as expected, such a criterion renders a class distribution that is skewed towards class 000. Thus, this reduces the chances of extracting rules describing the regulatory mechanisms of the (relatively) few representatives in the periodic classes.
We also attempted the use of only motifs or only transcription factors as descriptors. Results were similar; the class division based on detected periodicity was more specific towards cell cycle regulators than the class division based on clustering. However, class specific overrepresentation of cis-regulatory elements (i.e. motifs or transcription factors) was weaker, indicating an advantage of using the novel cis-regulatory descriptors based on both sequence motifs and actual transcription factor binding. A further improvement of these descriptors would be to include proximity and order of the sequence motifs in the promoter regions , however, such information was not utilized in this study.
The class-specific hierarchical structure of the discovered cis-regulatory descriptor combinations is an example of general system-wide properties discovered by our method. Genes in class 111 have the largest fraction of combinations where smaller subsets are associated with more restricted periodic expression. Note that due to the classification with rejection criterion discussed above, this hierarchy is not a trivial result of the class structure, e.g. genes in class 111 are neither a subset nor periodically similar to genes in class 110, in fact, genes in class 110 have a low probability for being periodic in the third experiment. Hence, the hierarchy suggests that the subsets of cis-regulatory descriptors are sufficient for periodic expression of the genes in fewer conditions. From an evolutionary standpoint it may be advantageous to ensure periodic expression of vital components in a wide variety of conditions by building in redundancy and to include many cis-regulatory elements. One alternative way to regulate periodic expression would have been to use only one phase-specific mechanism in genes that are always periodically expressed and to block periodic transcription in the appropriate conditions. However, this model has no support in the data and can be excluded.