Pattern recognition is central to many scientific disciplines and is often a first step in building a model that explains the data. In particular, the study of periodic phenomena and frequency detection has received much attention, leading to the well-established field of spectral analysis.

Biology is rich with (near) periodic behaviour, with sustained oscillations in the form of limit cycles playing important roles in many diverse phenomena such as glycolytic metabolism, circadian rhythms, mitotic cycles, cardiac rhythms, hormonal cycles, population dynamics, epidemiological cycles, etc. [1]. A conventional method for frequency detection is Fourier analysis. It is based on the fact that it is possible to represent any integrable function as an infinite sum of sines and cosines. The Fourier Transform (FT) uses this property to reveal the underlying components that are present in a signal [2]. Fourier theory has given rise to a wide range of diverse developments and far-reaching applications, demonstrating the theory's undisputed importance and impact. For frequency detection, however, it is known that the FT works optimally only for uniformly sampled, long, stationary time series. Furthermore, common procedures of pre-processing the data can cause problems. Time series can contain low frequency background fluctuations or drift that are unrelated to the signal of interest. For the FT, it is then necessary to remove the trends using detrending techniques. As has been shown previously, this detrending leads to convolution of the signal that can both remove evidence for periodicity and add false patterns [3]. Another known problem is aliasing. If a signal containing high frequencies is recorded with a low sampling rate, peaks of high frequencies can fold back into the frequency spectrum, appearing as low frequencies [2]. The Gibbs phenomenon provides another example where spurious peaks appear in a FT. It occurs at points of discontinuity in a periodic function, and results in so-called ringing artefacts around the "true" frequency peak [4]. As for the accuracy of the frequency estimate, no direct information of this is given by the output from a FT, since the sharpness of the peaks depends on a combination of factors such as noise levels and the length of the time series. For further details, see the extensive FT literature (e.g. [2, 5]).

Bayesian inference provides another approach for analysing data (for an introduction to Bayesian analysis, see [14]). It addresses additional aspects of the problem, such as the inherent uncertainty of the analysis and the effects of external noise. Using this framework [3], the method of Bayesian Spectrum Analysis (BSA) was developed by Bretthorst [15] and applied to Nuclear Magnetic Resonance (NMR) signals and parameter estimation with great success [16, 17].

There are several advantages of the Bayesian approach, including an inherent mechanism for estimating the accuracy of the result and all parameters, as well as the ability to compare different hypotheses directly. Focus is shifted to the question of interest by integrating out other parameters and noise levels. Initial knowledge of the system can be incorporated in the analysis and expressed in the prior probability distributions. There has been a recent flood of Bayesian papers with some convincing applications and promising developments in systems biology (see [18–30], and many others). The Bayesian approach to time series analysis has proven its value in fields such as NMR and ion cyclotron resonance analysis (e.g. [31] and [32]).

In this paper, we describe the development, implementation and testing of Bayesian model development coupled with BSA and Nested Sampling, in a biological context. We present a comparison of this approach with the FT, applied to a number of simulated test cases and two types of biological time series that present challenges to accurate frequency detection. We first present some necessary background, upon which we build to develop to our approach.

### Bayesian Spectrum Analysis

Our presentation in this section follows closely that of Bretthorst [15]. The aim is to infer the most probable frequency (or frequencies), ω, from the given data. We start by building a model (the hypothesis *H*) for the observed data, parameterised by angular frequency, ω, and amplitudes, **c**, and then use Bayes' rule to compute the posterior probability of the parameters, *P*(ω, **c**|*D*, *H*, *I*). By assigning priors to the model parameters **c** and integrating over these, we arrive at the posterior probability for the parameter of interest, ω,
. This is referred to as the marginal posterior probability of ω. We note that ω is an r-tuple, {*ω*
_{1}, *ω*
_{2}, ..., *ω*
_{
r
}}, with as many elements as there are distinct frequencies in the data.

A general model for observed data sampled at

*N* discrete time points,

*D* = {

*d*(

*t*
_{1}), ...,

*d*(

*t*
_{
N
})}, includes the signal of interest,

*s*(

*t*
_{
i
}), a possible background function,

*g*(

*t*
_{
i
}), and the noise present in the system,

*e*(

*t*
_{
i
}),

The signal function will usually be unknown and may be complicated, but can be approximated by a linear combination of

*m*
_{s} model functions,

*ψ*
_{
i
}, that we parameterise by the quantity of interest,

ω:

in which
are the expansion coefficients.

Similarly, the background function,

*g*(

*t*
_{
i
}), can be approximated by a set of

*m*
_{
g
} functions,

*ζ*
_{
i
}, that are independent of

ω,

where
are the background model function expansion coefficients.

Since

**a** and

**b** are not the main focus of the analysis, we will aim to integrate them out of the equations by marginalisation. Parameters that are treated in this manner as often referred to as nuisance parameters, which we denote here by

. Although the signal function depends on

ω, whereas the background function does not, for notational purposes we introduce the set of model functions,

*ϕ*
_{
i
}, which consists of both

*ψ*
_{
i
} and

*ζ*
_{
i
}. This allows us to condense the model equation into

such that now

*d*(

*t*
_{
i
}) =

*f*(

*t*
_{
i
}) +

*e*(

*t*
_{
i
}) and

*m* is now the total number of model functions,

*m*
_{
s
} +

*m*
_{
g
}. The model functions will typically not be orthogonal functions over the time series domain. This, however, can be achieved by Cholesky decomposition. In all subsequent calculations an orthogonal basis is used. From Bayes' rule, the joint probability distribution of the model for the parameters

ω and

**c** is

The likelihood function,

*P*(

*D*|

ω,

**c**,

*H*,

*I*), is calculated by comparing data produced by the model signal, equation (5), to real experimental data. If the model perfectly captures the signal, the difference between the model data and the real data is simply the noise in the system. The model of the data in equation (2) includes noise,

*e*(

*t*
_{
i
}), which we assume to be time independent in the further developments. The true noise level is unknown, but for a given noise power,

*σ*
^{2}, the principle of maximum entropy leads to the use of a normal distribution,

A noise model of this form ensures that the accuracy of the results is maximally conservative for a given noise power. We will later integrate over all possible noise levels to remove the dependence on

*σ*. With the described signal and this noise model, the likelihood was calculated by Bretthorst [

15] to be

where

*N* is the number of data points, and

where
is the mean-square of the data,
, and Φ_{
jk
} is the matrix of the model functions,
.

The goal of the analysis is to compute the posterior probability for frequencies in the data, i.e. to go from the joint probability distribution to a posterior probability of ω, independent of the other parameters. By integrating over all possible values of the parameters *σ* and **c**, the remainder is the marginal posterior of the parameters of interest, ω={*ω*
_{1}, *ω*
_{2}, ..., *ω*
_{
r
}}. This is an essential advantage of the Bayesian framework, allowing the analysis to focus on estimating the parameters of interest, regardless of the values of the others. If necessary, the other parameters can be estimated at a later point.

To integrate over the σ and **c** values, priors must first be assigned to them. We chose uniform priors for **c** and ω, representing complete lack of knowledge. We know that *σ* is continuous and must be positive, and in such cases a Jeffreys prior is appropriate, *P*(*σ*|*I*) = 1/*σ*. Both the uniform distribution over continuous variables and the Jeffreys prior are known as improper priors if bounds are not specified as they cannot be normalised. For more information on prior assignment see [3, 15].

Using the general model, equation (5), assigning the priors, calculating the likelihood function, equation (8), and integrating out the amplitudes and noise parameters, the posterior probability distribution of

ω is proportional to

where *h* is the projection of the data onto the orthonormal model functions,
, and
is the mean-square of the *h*
_{
j
},
, [15]. This expression of the posterior allows us to identify the strongest frequencies present in the data. For a good model, there will be a high probability peak in the posterior distribution at that **ω** = {*ω*
_{1}, *ω*
_{2}, ..., *ω*
_{
r
}}.