Surrogate-Based Optimization

Surrogate-based optimization represents a class of optimization methodologies that make use of surrogate modeling techniques to quickly find the local or global optima. It provides us a novel optimization framework in which the conventional optimization algorithms, e.g. gradient-based or evolutionary algorithms are used for sub-optimization.

For optimization problems, surrogate models can be regarded as approximation models for the cost function and state function, which are built from sampled data obtained by randomly probing the design space. Once the surrogate models are built, an optimization algorithm can be used to search the new candidate, based on the surrogate models, that is most likely to be the optimum. Since the prediction with a surrogate model is generally much more efficient than that with a numerical analysis code, the computational cost associated with the search based on the surrogate models is generally negligible. Surrogate modeling is referred to as a technique that makes use of the sampled data to build surrogate models, which are sufficient to predict the output of an expensive function at untried points in the feasibility space. Thus, how to choose sample points, how to build surrogate models, and how to evaluate the accuracy of surrogate models are key issues for surrogate optimization.

_images/points.png

Fit a regressor and find a (local) minimum:

_images/points_fit_optm.png

The following algorithm summarizes implemented Surrogate-Based Optimization for an expensive function \(f\) that can only be evaluated MaxIter times with a minimum MinEvals random evaluations.

Note

set \(E=\emptyset\), \(NumIter=0\)

while \(NumIter<MaxIter\) do

sample a random point \(x_0\) from feasible space

if \(\#(E) < MinEvals\) then
update \(E=E\cup\{(x_0, f(x_0))\}\)
else

find a surrogate \(\hat{f}\) that fits the points in \(E\)

find the minimum \(x_*\) of \(\hat{f}\) using \(x_0\) as the initial point (if required)

evaluate \(f\) at \(x_*\) and update \(E=E\cup\{(x_*, f(x_*))\}\)

update \(NumIter = NumIter+1\)

return the pair \((x, y)\in E\) with lowest found value for \(y\) as the approximation for the minimum of \(f\)

Clearly, there are various methods to accomplish some of the steps in the above algorithm like how to sample a new point \(x_0\), how to find the surrogate \(\hat{f}\), how to minimize and how to decide when we wish to use the surrogate.

Sampling

Three different sampling methods have been implemented and the SurrogateSearch class accepts user defined sampling classes that have a certain signature.

CompactSample

This class samples a random point from the feasibility set.

BoxSample

This class samples a random point from a cube centered around a given point (usually the last point \(f\) was evaluated on) with a given length. It makes sure that the sample belongs to the feasibility set.

The length of the edges of the cube can be provided by setting init_radius (default: 2.)

A contraction ratio can be provided by setting contraction (should be bigger than 0 and less than 1) to shrink the volume of the cube to assure fast convergence to a (local) optima (default value: 0.9).

SphereSample

This class samples a random point from a sphere centered around a given point (usually the last point \(f\) was evaluated on) with a given radius. It makes sure that the sample belongs to the feasibility set.

The radius of the sphere can be provided by setting init_radius (default: 2.)

A contraction ratio can be provided by setting contraction (should be bigger than 0 and less than 1) to shrink the radius of the sphere to assure fast convergence to a (local) optima (default value: 0.9).

Note

The sampling method can be passed to an instance of SurrogateSearch via sampling parameter. Along with sampling class, radius, contraction, ineq, and bounds may be provided to be used by the sampling class. ineq is a list of callables which represent the constraints. bounds is a list of tuples of real numbers representing the bounds on each variable.

Tip

A user-defined sampling class should follow the following structure:

class UserSample(object):
    def __init__(self, **kwargs):
        pass

    def check_constraints(self, point):
        """
        Checks constraints on the sample if provided;
        `point` is the candidate to be checked;
        should return a `boolean` True or False for if all constraints hold or not.
        """
        pass

    def sample(self, centre, cntrctn=1.):
        """
        Samples a point out of an sphere centered at `centre`;

        `centre` is a `numpy.array` the center of the sphere;
        `cntrctn` is a  `float` customized contraction factor
        returns a `numpy.array` the new sample
        """
        pass

Surrogate models

By default, SurrogateSearch uses a polynomial surface of degree 3 to approximate \(f\) based on existing data and will be updated on each iteration where a new piece of information a bout \(f\) is found. Typically, any regressor inherited from RegressorMixin that implements a fit and a predict method can be used.

Optimizer

A scipy optimizer can be used to find a minimum of the surrogate at each iteration. Note that if ineqs is not None, then most of scipy optimizers can not be used. The optimizers that work well with constraints include ‘SLSQP’ and ‘COBYLA’.

An alternative for the scipy optimizer is ‘Optimithon’.

Hyperparameter Optimization

Hyperparameter Optimization in machine learning with respect to a given performance measure (e.g., accuracy, f1, auc, …) usually is a computationally expensive task which fits within the scope of surrogate optimization technique. In fact this is the main reason that the project exists. Therefore, there is a special class designed for hyperparameter optimization of machine learning methods that follow the schema of the very popular machine learning library scikit-learn. The class structsearch.SurrogateRandomCV is a substitute for scikit-learn’s GridSearchCV or RandomizedSearchCV. An isntance of SurrogateRandomCV takes an estimator like GridSearchCV and RandomizedSearchCV and a params parameter, like param_grid or param_distributions, which determines (ranges of) the values each argument of the estimator can take over. The difference is that not only it accepts discrete list of values for each parameter, it also accepts ranges of integers and real numbers too. The params is a dictionary whose keys are the estimator’s arguments and their values are objects of the followin types:

  • Real(a, b): an interval of real numbers between \(a\) and \(b\);
  • Integer(a, b): an interval of integer numbers between \(a\) and \(b\);
  • Categorical(list): a list consists of discrete values;
  • HDReal(a, b): An n dimensional box of real numbers corresponding to the classification groups (e.g. class_weight). a is the tuple of lower bounds and b is the tuple of upper bounds.

Example The following code searches for the best values for a SVC:

from sklearn.svm import SVC
from SKSurrogate import *
clf = SVC()
params = {'C': Real(1.e-5, 10),
          'kernel': Categorical(['poly', 'rbf']),
          'degree': Integer(1, 4),
          'gamma': Real(1.e-5, 10),
          'class_weight': HDReal((1.e-3, 1.e-3), (10., 10.))}
srch = SurrogateRandomCV(clf, params)
srch.fit(X, y)
print(srch.best_estimator_)

Note

It is worth mentioning that using a Gaussian Process Regression as the regressor simulates a particular variation of the optimization method known as Bayesian Optimization method.

Bayesian Optimization is the method employed by the popular package skopt. The ui implemented for SurrogateRandomCV is very much similar to the one of skopt.BayesSearchCV, so one could use the same code for both given that the imports are carefully done.

Tip

The class SurrogateRandomCV works with intervals to handel the hyperparameters and the current sampling classes do not impose extra constraints on SurrogateSearch other than ranges for parameters, alternative scipy.minimize solvers can be used as well, such as L-BFGS-B, TNC, SLSQP.