Population Based Bandits¶

Population Based Bandits
PB2 Utils

Population Based Bandits ¶

class orion.algo.pbt.pb2.PB2(space, seed=None, population_size=50, generations=10, exploit=None, fork_timeout=60)[source]¶

Population Based Bandits

Warning: PB2 is broken in current version v0.2.4. We are working on a fix to be released in v0.2.5, ETA July 2022.

Population Based Bandits is a variant of Population Based Training using probabilistic model to guide the search instead of relying on purely random perturbations. PB2 implementation uses a time-varying Gaussian process to model the optimization curves during training. This implementation is based on ray-tune implementation. Oríon’s version supports discrete and categorical dimensions, and offers better resiliency to broken trials by using back-tracking.

See PBT documentation for more information on how to use PBT algorithms.

For more information on the algorithm, see original paper at https://arxiv.org/abs/2002.02518.

Parker-Holder, Jack, Vu Nguyen, and Stephen J. Roberts. “Provably efficient online hyperparameter optimization with population-based bandits.” Advances in Neural Information Processing Systems 33 (2020): 17200-17211.

Parameters

space: `orion.algo.space.Space`: Optimisation space with priors for each dimension.
seed: None, int or sequence of int: Seed for the random number generator used to sample new trials. Default: None
population_size: int, optional: Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.
generations: int, optional: Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10
exploit: dict or None, optional: Configuration for a pbt.exploit.BaseExploit object that determines when if a trial should be exploited or not. If None, default configuration is a PipelineExploit with BacktrackExploit and TruncateExploit.
fork_timeout: int, optional: Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise SuggestionTimeout. Default: 60

Attributes

configuration: Return tunable elements of this algorithm in a dictionary form appropriate for saving.
state_dict: Return a state dict that can be used to reset the state of the algorithm.

Methods

`seed_rng`(seed)	Seed the state of the random number generator.
`set_state`(state_dict)	Reset the state of the algorithm based on the given state_dict

property configuration¶: Return tunable elements of this algorithm in a dictionary form appropriate for saving.

seed_rng(seed: int | Sequence[int] | None) → None[source]¶

Seed the state of the random number generator.

Parameters

seed: int: Integer seed for the random number generator.

set_state(state_dict: dict) → None[source]¶: Reset the state of the algorithm based on the given state_dict

property state_dict: dict¶: Return a state dict that can be used to reset the state of the algorithm.

PB2 Utils ¶

PB2 Utils from Ray tune package, used in PB2 explore method.

Reference (2022/02/18): https://github.com/ray-project/ray/blob/master/python/ray/tune/schedulers/pb2_utils.py

class orion.algo.pbt.pb2_utils.TVSquaredExp(input_dim, variance=1.0, lengthscale=1.0, epsilon=0.0, active_dims=None)[source]¶

Time varying squared exponential kernel. For more info see the TV-GP-UCB paper: http://proceedings.mlr.press/v51/bogunovic16.pdf

Methods

K
Kdiag
update_gradients_full

orion.algo.pbt.pb2_utils.UCB(m, m1, x, fixed, kappa=0.5)[source]¶

UCB acquisition function.

Interesting points to note:

We concat with the fixed points, because we are not optimizing wrt these. This is the Reward and Time, which we can’t change. We want to find the best hyperparameters given the reward and time.
We use m to get the mean and m1 to get the variance. If we already have trials running, then m1 contains this information. This reduces the variance at points currently running, even if we don’t have their label. Ref: https://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf

orion.algo.pbt.pb2_utils.normalize(data, wrt)[source]¶: Normalize data to be in range (0,1), with respect to (wrt) boundaries, which can be specified.

orion.algo.pbt.pb2_utils.optimize_acq(func, m, m1, fixed, num_f)[source]¶: Optimize acquisition function.

orion.algo.pbt.pb2_utils.select_config(Xraw, yraw, current, newpoint, bounds, num_f)[source]¶

Selects the next hyperparameter config to try.

This function takes the formatted data, fits the GP model and optimizes the UCB acquisition function to select the next point.

Parameters

Xraw: np.array: The un-normalized array of hyperparams, Time and Reward
yraw: np.array: The un-normalized vector of reward changes.
current: list: The hyperparams of trials currently running. This is important so we do not select the same config twice. If there is data here then we fit a second GP including it (with fake y labels). The GP variance doesn’t depend on the y labels so it is ok.
newpoint: np.array: The Reward and Time for the new point. We cannot change these as they are based on the new weights.
bounds: dict: Bounds for the hyperparameters. Used to normalize.
num_f: int: The number of fixed params. Almost always 2 (reward+time)

Returns

np.array: A vector of new hyperparameters.

orion.algo.pbt.pb2_utils.select_length(Xraw, yraw, bounds, num_f)[source]¶: Select the number of datapoints to keep, using cross validation

orion.algo.pbt.pb2_utils.standardize(data)[source]¶: Standardize to be Gaussian N(0,1). Clip final values.

Population Based Bandits¶