Population Based Bandits

Population Based Bandits

class orion.algo.pbt.pb2.PB2(space, seed=None, population_size=50, generations=10, exploit=None, fork_timeout=60)[source]

Population Based Bandits

Warning: PB2 is broken in current version v0.2.4. We are working on a fix to be released in v0.2.5, ETA July 2022.

Population Based Bandits is a variant of Population Based Training using probabilistic model to guide the search instead of relying on purely random perturbations. PB2 implementation uses a time-varying Gaussian process to model the optimization curves during training. This implementation is based on ray-tune implementation. Oríon’s version supports discrete and categorical dimensions, and offers better resiliency to broken trials by using back-tracking.

See PBT documentation for more information on how to use PBT algorithms.

For more information on the algorithm, see original paper at https://arxiv.org/abs/2002.02518.

Parker-Holder, Jack, Vu Nguyen, and Stephen J. Roberts. “Provably efficient online hyperparameter optimization with population-based bandits.” Advances in Neural Information Processing Systems 33 (2020): 17200-17211.

space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int

Seed for the random number generator used to sample new trials. Default: None

population_size: int, optional

Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.

generations: int, optional

Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10

exploit: dict or None, optional

Configuration for a pbt.exploit.BaseExploit object that determines when if a trial should be exploited or not. If None, default configuration is a PipelineExploit with BacktrackExploit and TruncateExploit.

fork_timeout: int, optional

Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise SuggestionTimeout. Default: 60


Return tunable elements of this algorithm in a dictionary form appropriate for saving.


Return a state dict that can be used to reset the state of the algorithm.



Seed the state of the random number generator.


Reset the state of the algorithm based on the given state_dict

property configuration

Return tunable elements of this algorithm in a dictionary form appropriate for saving.

seed_rng(seed: int | Sequence[int] | None) None[source]

Seed the state of the random number generator.

seed: int

Integer seed for the random number generator.

set_state(state_dict: dict) None[source]

Reset the state of the algorithm based on the given state_dict

property state_dict: dict

Return a state dict that can be used to reset the state of the algorithm.

PB2 Utils

PB2 Utils from Ray tune package, used in PB2 explore method.

Reference (2022/02/18): https://github.com/ray-project/ray/blob/master/python/ray/tune/schedulers/pb2_utils.py

class orion.algo.pbt.pb2_utils.TVSquaredExp(input_dim, variance=1.0, lengthscale=1.0, epsilon=0.0, active_dims=None)[source]

Time varying squared exponential kernel. For more info see the TV-GP-UCB paper: http://proceedings.mlr.press/v51/bogunovic16.pdf





orion.algo.pbt.pb2_utils.UCB(m, m1, x, fixed, kappa=0.5)[source]

UCB acquisition function.

Interesting points to note:

  1. We concat with the fixed points, because we are not optimizing wrt these. This is the Reward and Time, which we can’t change. We want to find the best hyperparameters given the reward and time.

  2. We use m to get the mean and m1 to get the variance. If we already have trials running, then m1 contains this information. This reduces the variance at points currently running, even if we don’t have their label. Ref: https://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf

orion.algo.pbt.pb2_utils.normalize(data, wrt)[source]

Normalize data to be in range (0,1), with respect to (wrt) boundaries, which can be specified.

orion.algo.pbt.pb2_utils.optimize_acq(func, m, m1, fixed, num_f)[source]

Optimize acquisition function.

orion.algo.pbt.pb2_utils.select_config(Xraw, yraw, current, newpoint, bounds, num_f)[source]

Selects the next hyperparameter config to try.

This function takes the formatted data, fits the GP model and optimizes the UCB acquisition function to select the next point.

Xraw: np.array

The un-normalized array of hyperparams, Time and Reward

yraw: np.array

The un-normalized vector of reward changes.

current: list

The hyperparams of trials currently running. This is important so we do not select the same config twice. If there is data here then we fit a second GP including it (with fake y labels). The GP variance doesn’t depend on the y labels so it is ok.

newpoint: np.array

The Reward and Time for the new point. We cannot change these as they are based on the new weights.

bounds: dict

Bounds for the hyperparameters. Used to normalize.

num_f: int

The number of fixed params. Almost always 2 (reward+time)


A vector of new hyperparameters.

orion.algo.pbt.pb2_utils.select_length(Xraw, yraw, bounds, num_f)[source]

Select the number of datapoints to keep, using cross validation


Standardize to be Gaussian N(0,1). Clip final values.