To efficiently solve the calibration problem there is a need of global optimization methods whose performance does not deteriorate when the number of parameters to be estimated is very large (as it occurs with most methods). Global optimization methods can be divided into deterministic and stochastic. Deterministic global optimization methods guarantee that the solution is the global optimum, but the computational effort they require can make them unaffordable for large-scale problems. Stochastic global optimization methods, on the other hand, do not guarantee the global optimality of the solution, but they are frequently capable of finding excellent solutions in reasonable computation times. A particularly efficient class of stochastic global optimization methods are the so-called metaheuristic approaches, which combine mechanisms for exploring the search space and exploiting previously obtained knowledge. Here we propose a method, Cooperative enhanced Scatter Search (CeSS), based on extensions and cooperative parallelization of the scatter search metaheuristic[22–25].
Global optimization via enhanced scatter search
Scatter search can be classified as an evolutionary optimization method which, like many other metaheuristics, arose in the context of integer optimization[26]. Several adaptations to continuous problems have been developed in recent years. The application of the method to parameter estimation in systems biology problems has provided excellent results[22].
Scatter search is a population-based algorithm which, compared with other methods like genetic algorithms, makes use of a low number of population members, called “Reference Set” (RefSet) in this context. Besides, scatter search includes the so-called “improvement method” which usually consists in a local search to speed-up the convergence to optimal solutions.
To illustrate how the method works, Figures1.(a-d) represent the contour plots of an arbitrary objective function to be optimized (e.g., a least squares function in the case of parameter estimation), together with the solutions which constitute the RefSet and their evolution.
The method starts by creating an initial population of diverse solutions within the search space (Figure1.a). From them, the best solutions in terms of quality are selected to create the initial RefSet (in red in Figure1.b). The rest of solutions in the RefSet are chosen randomly (in green in Figure1.b), although in some advanced implementations they may be chosen following a criterion of maximum diversity to cover a broader area of the search space. Once the RefSet solutions have been selected, the remaining diverse solutions generated in the first step are deleted and the so-called “subset generation method” starts. It consists in combining sets of RefSet solutions to create new ones. In our scheme, we combine every solution in the RefSet with the other RefSet by generating hyper-rectangles defined by their relative position and distance, and creating new solutions inside them. This step is illustrated in Figure1.c. Each RefSet member will create one hyper-rectangle with every other RefSet member, thus generating a number of b−1 new solutions (being b the RefSet size) per each RefSet member. After this procedure, all the RefSet members have a set of offspring solutions around them covering different distances and directions (Figure1.d). If the best offspring solution outperforms the parent solution (which is the case of Figure1.d with the best offspring solution represented as a star), the replacement (or update) is carried out. Otherwise, the same RefSet member will stay in the population in the next iteration. Although the procedure is illustrated for just one RefSet member, the same scheme applies for every solution in the RefSet. After the RefSet update, the algorithm applied the “solution combination method” again, repeating the process until a stop criterion is met.
In this work, a novel extended implementation of the enhanced scatter search metaheuristic (eSS)[24, 25], has been used. Its pseudocode is shown in Algorithm 1.
Algorithm 1
Basic pseudocode of eSS
Set parameters: dim_refset, local.n 2, balance, ndiverse
Initialize n
stuck
, neval
Create set of ndiverse solutions
Generate initial RefSet with dim_refset solutions with high quality and random elements
repeat
Sort RefSet by quality [x1,x2,…,xdim_refset]so that f(xi) ≤ f(xj) where i,j∈[1,2,…,dim_refset] and i < j
if max with i < j then
Replace xjby a random solution
end if
for i = 1 to dim_refset do
Combine xiwith the rest of population members to generate a set of dim_refset new solutions,
offspring
i
best solution in offspringi
if outperforms xithen
Label xi
xtemp = xi
improvement = 1
∧ = 1
while
do
Create a new solution, xnew, in the hyper-rectangle defined by
improvement = improvement + 1
if improvement = 2 then
∧ = ∧/2
improvement = 0
end if
end while
end if
end for
offspring= with n = dim_refset
if neval ≥ local.n 2 and at least one local solution has been found then
Sort offspring solutions by quality [x1,1,x2,1,…,xm,1] where m = n(n−1) and f(xi,1) ≤ f(xj,1) where i,j∈[1,2,…,m] and i < j
Compute the minimum distance between each element i ∈ [1,2,…,m] in offspring and all the local optima found so far, d(x
i
)
Sort offspring solutions by diversity [x1,2,x2,2,…,xm,2] with d(xi,2) ≥ d(xj,2) where i,j∈[1,2,…,m] and i < j
Choose z as the offspring member balancing quality and diversity according to balance
Perform a local search from z to obtain z∗
if z∗ is a new local optimum then
Update list of found local optima adding z∗
end if
if f(z∗) < fbest then
Update xbest, fbest
end if
neval = 0
end if
neval = neval + function evaluations of current iteration
Replace labeled population members by their corresponding and reset their corresponding n
stuck
(i)
Add one unit to the corresponding n
stuck
(j) of the not labeled population members
if any of the n
stuck
values > nchange then
Replace those population members by random solutions and reset their n
stuck
values
end if
until stopping criterion is met
Apply final local search over xbest and update it
The innovative mechanisms implemented in different parts of the method as well as the appropriate local search selection, make the algorithm specially suitable for solving large-scale optimization problems. These mechanisms, which address the most important question in global optimization (i.e., the balance between intensificacion, or local search, and diversification, or global search) are depicted in the following points:
-
Population-based algorithms with a large population size can become inefficient due to the large amount of function evaluations they perform in each iteration. On the other hand, if the population size is too small, the algorithm may converge prematurely to suboptimal solutions. The enhanced scatter search method uses a small population size, even for large-scale problems, but allowing more search directions than in classical scatter search designs thanks to the hyper-rectangle-based combination scheme. This does not increase the number of evaluations per iteration while preserving the diversity in the search.
-
Another key aspect in population-based algorithms is the population update scheme. (μ + λ) strategies[27] (i.e., members of the following population will be selected among the set of solutions including old population members and new offspring solutions) may tend to prematurely stagnate the population in suboptimal solutions. (μ λ) strategies (i.e., members of the following population will be selected among the set of solutions including only new offspring solutions) may need a high number of function evaluations to converge. On the other hand a (1 + 1) strategy applied to every population member as the one depicted in Figures1.(a-d) can take the advantages of both mechanisms. This is the strategy used in eSS and, again, provides a good balance between intensification (i.e., local search) and diversification (i.e., global search).
-
The specific intensification mechanisms are crucial in large scale optimization in order to reduce the computational times as much as possible. eSS implements an intensification mechanism already in the global phase, which exploits the promising directions defined by a pair of solutions in the RefSet. Besides, the use of a local search method to accelerate the convergence is mandatory in this type of applications. Gradient-based methods are the most widely used local methods. However, for dynamic problems they might be inefficient due to some characteristics of this type of problems (see[28] for more details). Furthermore, the calculation of the numerical sensitivities carried out by these methods may become unaffordable when the problem dimension is too high. A heuristic local search method seems to be the most convenient choice when dealing with large-scale problems because, even if the local convergence might not be ensured, avoiding the calculation of sensitivities or search directions may save a large amount of calculation time. For this reason, and after extensive initial tests, we have selected a heuristic method known as “Dynamic Hill Climbing”[29], available in the eSS framework. This method provided the best results for the problems tested.
-
Even if intensification strategies should be favoured in large-scale optimization due to the limited computational time usually available in real applications, diversification strategies should not be neglected because stagnation of the search may appear even in early stages of the process. In this regard, eSS implements diversification strategies which make use of memory to infer whether a solution is stagnated in a local optimum or if it is too close to previously found solutions. When these solutions are identified, they are automatically replaced to avoid spending computational effort searching in areas already explored.
Cooperative optimization
A common way of reducing the computational cost of a given algorithm is to parallelize it. This entails performing several of its calculations simultaneously in different processors. Except for some special cases, which are usually called “embarrasingly parallel”, it is not trivial to separate the problem into parallel tasks. Here we are interested in the parallelization of a metaheuristic optimization algorithm, the enhanced scatter search method described in the previous section.
Before we describe the proposed methodology, we clarify the use of a few words in order to avoid confusion. We refer to a context where there are a number of processors available for computing, which can work simultaneously (in parallel). All of the processors are assumed to be physically identical, that is, they have the same hardware characteristics. A program running in a processor is called thread; both terms are interchangeable. In our methodology we use a master processor, with the task of coordinating the remaining processors (slaves). All of the slave processors implement the same program, which means they perform similar calculations, using different data sets and different settings.
There are not many examples of research in the area of parallelization of metaheuristics exploiting cooperation in the way we present here. In[30] three different parallelizations of the scatter search were proposed and compared. In the first one, each local search was parallelized. In the second one, several local searches were carried out in parallel. In these two cases it was possible to reduce the computational time. A third option was to run the algorithm for different populations in parallel, but without any communication between threads.
More recently[31] a more ambitious approach was presented, where different kinds of metaheuristics were combined in order to profit from their complementary advantages. Information was shared between algorithms and it was dynamically tuned, so the most succesful algorithms were allowed to share more information. For a detailed review of the state of the art in parallelization of metaheuristics, see[32].
Sharing information can be seen as a way of cooperating. Such cooperation produces more than just speed-up: it can change the systemic properties of an algorithm and therefore its macroscopic behaviour. This was acknowledged in[33], where four parameters that may affect this behaviour were identified: (i) information gathered and made available for sharing; (ii) identification of programs that share information; (iii) the frequency of information sharing among the sequential programs; and (iv) the number of concurrent programs.
In this section we explore the possibilities of cooperation in order to speed up the convergence of the enhanced Scatter Search methodology. We have developed a strategy that is in some way intermediate between the two aforementioned approaches of[30] and[31]. The idea is to run, in parallel, several implementations or threads of the optimization algorithm–which may have different settings and/or random initializations–and exchange information between them. According to the classification proposed in[33], this arrangement is characterized as follows:
-
1.
Information available for sharing: the best solution found and, optionally, the reference set (RefSet), which contains information about the diversity of solutions.
-
2.
Threads that share information: all of them.
-
3.
Frequency of information sharing: the threads exchange information at a fixed interval τ.
-
4.
Number of concurrent programs: η.
Each of the η threads has a fixed degree of “aggressiveness”. “Conservative” threads make emphasis on diversification (global search) and are used for increasing the probabilities of finding a feasible solution, even if the parameter space is “rugged” or weakly structured. “Aggressive” threads make emphasis on intensification (local search) and speed up the calculations in “smoother” areas. Please note that both conservative and agressive threads should be able to locate the globally optimal solution of the class of problems considered here, but their relative efficiency will vary depending on the region of the parameter space where the individual threads are operating. In this way, we achieve a synergistic gain. Communication, which takes place at fixed time intervals, enables each thread to benefit from the knowledge gathered by the others. This knowledge may include not only information about the best solution found so far, but also about the sets of diverse parameter vectors that may be worth trying for improving the solution. Thus this strategy has several degrees of freedom that have to be fixed: the time between communication (τ), the number of threads (η), and the strategy adopted by each thread (Ψ). These adjustments should be chosen carefully depending on the particular problem we want to solve. In the following lines we give some guidelines for doing this.
Time between information sharing, τ
On the one hand, the time between information sharing must be large enough to allow each of the threads to exploit the algorithm’s capabilities. This includes the initial generation of diverse solutions and their evaluation, followed by several iterations which usually include local searches. The necessary time depends on the thread’s settings, which determine, among other things, the number of diverse solutions that are generated. It also depends on the time required for an evaluation of the objective function. On the other hand, if the time is too large, cooperation will happen only sporadically, and its efficiency will be reduced. Hence the appropriate value should be chosen from a compromise, taking into account the time required by a single thread to complete a predefined number of function evaluations, and the problem dimension (i.e. the number of parameters, n
par
). We define a tuning parameter τ as
(8)
where n
eval
is the maximum number of function evaluations allowed between cooperation instants. Therefore, τ, as defined above, is a dimensionless and machine-independent parameter that sets the duration of the intervals between information sharing. The logarithm is used for the convenience of avoiding very large numbers. For all the problems considered in this study, we have empirically found that a value in the interval 2.5 < τ < 3.5, with a default value of 2.5, results in a good trade-off between efficiency and robustness of the cooperative strategy.
Number of concurrent threads, η
As η increases, the expected speed up increases too. However, this increase cannot grow in a linear way for an arbitrarily large number of threads, due to a number of reasons. One reason is that, as the number of threads increases, so does the communication overhead. More importantly, the risk of overlapping - that is, carrying out similar searches in different threads - increases too due to the relatively small size of the RefSet. Hence, when the number of threads is very large, the resulting speed up is smaller than the increase in computational effort. For the problems considered here, we have empirically found that an η = 10 led to good results and thus can be recommended as a default value.
Settings (strategy) for each thread, Ψ
For cooperation to be efficient, every thread running in parallel must perform different calculations. Due to the random nature of metaheuristic algorithms, this can be achieved simply by choosing different random initializations, or “seeds”, for each thread. This results first in different sets of initial guesses, and subsequently in the exploration of different points of the search space. However, this is not the only difference that may be made between threads: we can also select a different behaviour (more or less aggressive) for each of the threads. An “aggressive” thread performs frequent local searches, trying to refine the solution very quickly, and keeps a small reference set of solutions. This means that it will perform well in problems with parameter spaces that have a smooth shape. “Conservative” threads, on the other hand, have a large reference set and perform local searches only sporadically. This means that they spend more time combining parameter vectors and exploring the different regions of the parameter space. Thus, they are more robust, and hence appropriate, for problems with rugged parameter spaces. Since the exact nature of the problem to consider is always unknown, it is advisable to choose a range of settings that yields conservative, aggressive, and intermediate threads. For the eSS algorithm this can be obtained by tuning the following settings:
-
Number of elements in the Reference Set (dim_refset).
-
Minimum number of iterations of the eSS algorithm between two local searches (local.n2).
-
Balance between intensification and diversification in the selection of initial points for the local searches (balance).
-
Number of diverse solutions initially generated (ndiverse).
All these settings have qualitatively the same influence on the algorithm’s behaviour: if they are large, they lead to conservative threads; if they are small, to aggressive threads. The four parameters that define the degree of aggressiveness of each of the η threads can be stored in an array Ψη×4. By default, we recommend to use a broad spectrum of aggressiveness using the default values 0.5·n
par
< dim _refset < 20·n
par
; 0 < local.n 2 < 100; 0 < balance < 0.5; and 5·n
par
< ndiverse < 20·n
par
.
Figure2 shows the cooperative loop, and Algorithm 2 summarizes its pseudo-code. The text marked as SLAVE, PROCESSOR j corresponds to the code implemented in parallel by each of the slave processors; the rest of the computations are carried out by the master processor.
Algorithm 2
Basic pseudocode of the cooperative strategy
MASTER, PROCESSOR 0: Initialization
Set parameters that are common for all threads: τ, η, n
iters
Set array of aggressiveness parameters: Ψη×4 = {dim_refsetη×1,ndiverseη×1,local.n2η×1,balanceη×1}
Initialize global reference set array: Globa l
ref
= []
for i = 1 to n
iters
do
for j = 1 to η do
SLAVE, PROCESSOR j (concurrent processing): run optimization according to Algorithm 1
(Thread j uses aggressiveness settings vector Ψ{j,:} and initial points Globa l
ref
)
end for
if CPUtime ≥ τ then
for j = 1 to η do
Read results of thread j
if ref
j
∉ Global
ref
then
Global
ref
= [Global
ref
,ref
j
]
end if
end for
end if
end for
Final solution = best solution in Global
ref