Population Based Bandits¶
Population Based Bandits¶
- class orion.algo.pbt.pb2.PB2(space, seed=None, population_size=50, generations=10, exploit=None, fork_timeout=60)[source]¶
Population Based Bandits
Warning: PB2 is broken in current version v0.2.4. We are working on a fix to be released in v0.2.5, ETA July 2022.
Population Based Bandits is a variant of Population Based Training using probabilistic model to guide the search instead of relying on purely random perturbations. PB2 implementation uses a time-varying Gaussian process to model the optimization curves during training. This implementation is based on ray-tune implementation. Oríon’s version supports discrete and categorical dimensions, and offers better resiliency to broken trials by using back-tracking.
See PBT documentation for more information on how to use PBT algorithms.
For more information on the algorithm, see original paper at https://arxiv.org/abs/2002.02518.
Parker-Holder, Jack, Vu Nguyen, and Stephen J. Roberts. “Provably efficient online hyperparameter optimization with population-based bandits.” Advances in Neural Information Processing Systems 33 (2020): 17200-17211.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- population_size: int, optional
Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.
- generations: int, optional
Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10
- exploit: dict or None, optional
Configuration for a
pbt.exploit.BaseExploit
object that determines when if a trial should be exploited or not. If None, default configuration is aPipelineExploit
withBacktrackExploit
andTruncateExploit
.- fork_timeout: int, optional
Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise
SuggestionTimeout
. Default: 60
- Attributes
configuration
Return tunable elements of this algorithm in a dictionary form appropriate for saving.
state_dict
Return a state dict that can be used to reset the state of the algorithm.
Methods
seed_rng
(seed)Seed the state of the random number generator.
set_state
(state_dict)Reset the state of the algorithm based on the given state_dict
- property configuration¶
Return tunable elements of this algorithm in a dictionary form appropriate for saving.
- seed_rng(seed: int | Sequence[int] | None) None [source]¶
Seed the state of the random number generator.
- Parameters
- seed: int
Integer seed for the random number generator.
PB2 Utils¶
PB2 Utils from Ray tune package, used in PB2 explore method.
Reference (2022/02/18): https://github.com/ray-project/ray/blob/master/python/ray/tune/schedulers/pb2_utils.py
- class orion.algo.pbt.pb2_utils.TVSquaredExp(input_dim, variance=1.0, lengthscale=1.0, epsilon=0.0, active_dims=None)[source]¶
Time varying squared exponential kernel. For more info see the TV-GP-UCB paper: http://proceedings.mlr.press/v51/bogunovic16.pdf
Methods
K
Kdiag
update_gradients_full
- orion.algo.pbt.pb2_utils.UCB(m, m1, x, fixed, kappa=0.5)[source]¶
UCB acquisition function.
Interesting points to note:
We concat with the fixed points, because we are not optimizing wrt these. This is the Reward and Time, which we can’t change. We want to find the best hyperparameters given the reward and time.
We use m to get the mean and m1 to get the variance. If we already have trials running, then m1 contains this information. This reduces the variance at points currently running, even if we don’t have their label. Ref: https://jmlr.org/papers/volume15/desautels14a/desautels14a.pdf
- orion.algo.pbt.pb2_utils.normalize(data, wrt)[source]¶
Normalize data to be in range (0,1), with respect to (wrt) boundaries, which can be specified.
- orion.algo.pbt.pb2_utils.optimize_acq(func, m, m1, fixed, num_f)[source]¶
Optimize acquisition function.
- orion.algo.pbt.pb2_utils.select_config(Xraw, yraw, current, newpoint, bounds, num_f)[source]¶
Selects the next hyperparameter config to try.
This function takes the formatted data, fits the GP model and optimizes the UCB acquisition function to select the next point.
- Parameters
- Xraw: np.array
The un-normalized array of hyperparams, Time and Reward
- yraw: np.array
The un-normalized vector of reward changes.
- current: list
The hyperparams of trials currently running. This is important so we do not select the same config twice. If there is data here then we fit a second GP including it (with fake y labels). The GP variance doesn’t depend on the y labels so it is ok.
- newpoint: np.array
The Reward and Time for the new point. We cannot change these as they are based on the new weights.
- bounds: dict
Bounds for the hyperparameters. Used to normalize.
- num_f: int
The number of fixed params. Almost always 2 (reward+time)
- Returns
np.array
A vector of new hyperparameters.