Optimisation of an exemplar oculomotor model using multi-objective genetic algorithms executed on a GPU-CPU combination

Background Parameter optimisation is a critical step in the construction of computational biology models. In eye movement research, computational models are increasingly important to understanding the mechanistic basis of normal and abnormal behaviour. In this study, we considered an existing neurobiological model of fast eye movements (saccades), capable of generating realistic simulations of: (i) normal horizontal saccades; and (ii) infantile nystagmus – pathological ocular oscillations that can be subdivided into different waveform classes. By developing appropriate fitness functions, we optimised the model to existing experimental saccade and nystagmus data, using a well-established multi-objective genetic algorithm. This algorithm required the model to be numerically integrated for very large numbers of parameter combinations. To address this computational bottleneck, we implemented a master-slave parallelisation, in which the model integrations were distributed across the compute units of a GPU, under the control of a CPU. Results While previous nystagmus fitting has been based on reproducing qualitative waveform characteristics, our optimisation protocol enabled us to perform the first direct fits of a model to experimental recordings. The fits to normal eye movements showed that although saccades of different amplitudes can be accurately simulated by individual parameter sets, a single set capable of fitting all amplitudes simultaneously cannot be determined. The fits to nystagmus oscillations systematically identified the parameter regimes in which the model can reproduce a number of canonical nystagmus waveforms to a high accuracy, whilst also identifying some waveforms that the model cannot simulate. Using a GPU to perform the model integrations yielded a speedup of around 20 compared to a high-end CPU. Conclusions The results of both optimisation problems enabled us to quantify the predictive capacity of the model, suggesting specific modifications that could expand its repertoire of simulated behaviours. In addition, the optimal parameter distributions we obtained were consistent with previous computational studies that had proposed the saccadic braking signal to be the origin of the instability preceding the development of infantile nystagmus oscillations. Finally, the master-slave parallelisation method we developed to accelerate the optimisation process can be readily adapted to fit other highly parametrised computational biology models to experimental data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0416-2) contains supplementary material, which is available to authorized users.


The GPU executable
Programs that are executed on a GPU have to be written using a specific programming framework. The two most widely used frameworks are the NVIDIA Compute Unified Device Architecture (CUDA) [1] and the Open Computing Language (OpenCL) [2]. The CUDA framework is a proprietary architecture specifically designed to be run only on NVIDIA GPUs. By contrast, OpenCL which is managed by the Khronos Group [3], is an open framework which can be used to program almost any parallel architecture, such as multicore CPUs, GPUs, coprocessors (e.g. the Intel Xeon Phi) or a network of connected GPUs and/or CPUs. To ensure better hardware compatibility of our code, we therefore developed our executable in the OpenCL framework.
OpenCL includes a C99 based language for writing functions (kernels), which are executed on OpenCL devices (e.g. CPUs or GPUs) and an application programming interface (API). The API is used to define and control the OpenCL devices and execute the kernels on them. A simple OpenCL program involves the steps of finding the computing device, compiling the code that will be run on the device, copying the data to the device, performing the computation and copying back the results of the computation.
Our OpenCL program includes a kernel with functions that numerically solve the saccadic model for different parameter combinations. Moreover, our program includes code written in C++, which, using the OpenCL API, performs the actions necessary to run the kernel on the GPU. The numerical method that we used in the kernel was the implicit mid-point rule [4], with a time step ∆t " 5ˆ10´6 (as the saccadic model is a stiff system, a small time step is required for accurate integration). To assess whether the ODE solver and the selected time step gave accurate results, we compared the model solutions generated using our method with those obtained using the MATLAB stiff ODE solver ode15s, with relative and absolute tolerances both set to 10´8.
The logical flow of the GPU executable program is as follows. First, the program reads the input binary file which contains the parameter combinations comprising the NSGA-II population and the initial conditions. Subsequently, these parameter values and initial conditions are transferred to the GPU's RAM. Next, the OpenCL kernel is called to integrate the model for 80 time steps. In parallel with the integration (the current kernel call), the results of the previous integration (the previous kernel call) are written to a binary file. Once the model has been integrated over the required timespan for all parameter combinations, the program quits.
We found that the sampling frequency of the results (i.e. the number of points in the time series used for calculating fitness) played an important role in the GPU computation time when fitting the model to nystagmus waveforms. This was due to the bottleneck caused by the transfer to and from the GPU memory, and the time required to write the data to the binary file. Higher sampling frequencies substantially increased the time required for model integration and solution analysis (fitness evaluation) and also the size of the binary file produced. For example, using a sampling frequency of 2500 Hz, 20000 different parameter sets with a simulation time of 6 s each produced a 2.23 GB file, with an overall computation time of 35.44 s. Increasing the sampling frequency to 5000 Hz increased the file size to 4.46 GB and the overall computation time to 45.56 s. However, it was not possible to overcome this problem by arbitrarily reducing the sampling rate, since by considering fits to synthetic nystagmus waveforms (see Figure S4), we found that a low sampling frequency did not allow accurate period and amplitude measures to be obtained. In order to choose the lowest sampling frequency capable of resolving this speed/accuracy trade-off, we generated very highly sampled versions of the synthetic waveforms in Figure S4 by integrating the model for each corresponding parameter set with ∆t set to 5ˆ10´6 s (i.e. a sampling frequency of 5ˆ10 6 Hz). We then subsampled each time series at 250, 500, 1000, 1250, 2500 and 5000 Hz to obtain a set of candidate waveforms. Next, we calculated the sum-of-square errors for amplitude and period between the highly sampled and candidate waveforms. On the basis of these results, the final sampling frequency was chosen to be 2500 Hz, as using a higher frequency caused no significant reduction in error. Returning results at this frequency enabled us to minimise the effect of the GPU call bottleneck, the amount of memory used and the computation time required to analyse the results, without significantly degrading the accuracy of the period and amplitude calculations.      Table S1. The initial conditions were set to gp0q " np0q " rp0q " lp0q " 0 and mp0q " 2 in each case. The simulated waveform types are: asymmetric pseudocycloid (A); pseudo-cycloid (B); jerk (C); and bidirectional jerk (D). On each plot, the vertical axis represents the horizontal gaze angle in degrees (˝), with positive values denoting rightward eye positions. Time is in seconds (s).  Table S2 for amplitudes of 5, 10 and 20 degrees. For each amplitude ∆g, the initial conditions were set to gp0q " np0q " rp0q " lp0q " 0 and mp0q " ∆g. The black circles indicate the initiation and termination of each saccade, calculated by applying a velocity threshold of 2 deg/s. On each plot, the vertical axis represents the horizontal eye velocity in deg/s (˝/s), with positive values denoting rightward motion. Time is in seconds (s).                       Fig. 13 about the origin of objective space. In each plot, the red cross (+), red star (˚) and red x (ˆ) indicate the solutions yielding the minimum Euclidean distance to the axes origin, the best fit to a 10 degree saccade and the best fit to a 20 degree saccade, respectively.    Table S3: Time required to run NSGA-II 16 times for the four synthetic nystagmus waveforms shown in Fig. S4 as a function of population size. The computation times shown here were obtained using the multiple parallel runs method. The CPU used was the Intel i7-4790K and the GPU used was the AMD Firepro W8100.    Table S7: Optimised parameter values for experimental nystagmus waveform D. The parameter values were generated from 16 NSGA-II runs with a population size of 4000.   Table S11: Optimised parameter values for experimental saccades with selection method D. The parameter values were generated from 16 NSGA-II runs with a population size of 8000.