Algorithms¶
Default algorithm is a random search based on the probability distribution given to a search parameter’s definition.
Selecting and Configuring¶
In a Oríon configuration YAML, define:
experiment:
algorithm:
gradient_descent:
learning_rate: 0.1
In this particular example, the name of the algorithm extension class to be
imported and instantiated is Gradient_Descent
, so the lower-case identifier
corresponds to it.
All algorithms have default arguments that should work reasonably well in general.
To tune the algorithm for a specific problem, you can set those arguments in the
yaml file as shown above with learning_rate
.
Included Algorithms¶
Random Search¶
Random search is the most simple algorithm. It samples from given priors. That’s it.
Configuration¶
experiment:
algorithm:
random:
seed: null
- class orion.algo.random.Random(space: Space, seed: int | Sequence[int] | None = None)[source]
An algorithm that samples randomly from the problem’s space.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
Grid Search¶
Grid search is one of the simplest algorithm. It can work reasonably well for small search spaces of one or two dimensions but should be avoided for larger search spaces. The search space can be configured in three different ways.
Default for the lazy. You can set a very large
n_values
(ex: 100) and the grid will be adjusted so that it results in less thanmax_trials
as defined in the experiment configuration.You can set
n_values
to the number of points desired by dimension. Note that if this leads to too many trials the grid will be shrunken down to fit belowmax_trials
.You can pass a dictionary to
n_values
specifying the number of points for each dimensions. Ex:n_values: {'dim1': 3, 'dim2': 4}
.
Note
For categorical dimensions (choices()
) all values are used to build the grid. This means
n_values
will not be honored. A warning is printed when this happens. Accordingly,
if too many options are provided for the categorical dimensions the grid may lead to more trials
than max_trials
. A ValueError
will be raised in such scenario.
Configuration¶
experiment:
algorithm:
gridsearch:
n_values: 100
- class orion.algo.gridsearch.GridSearch(space: Space, n_values: int | dict[str, int] = 100)[source]
Grid Search algorithm
- Parameters
- n_values: int or dict
Number of trials for each dimensions, or dictionary specifying number of trials for each dimension independently (name, n_values). For categorical dimensions, n_values will not be used, and all categories will be used to build the grid.
Hyperband¶
Hyperband extends the SuccessiveHalving algorithm by providing a way to exploit a
fixed budget with different number of configurations for SuccessiveHalving
algorithm to
evaluate. Each run of SuccessiveHalving
will be defined as a bracket
in Hyperband.
Hyperband requires two inputs (1) R
, the maximum amount of resource that can be allocated
to a single configuration, and (2) eta
, an input that controls the proportion of
configurations discarded in each round of SuccessiveHalving.
To use Hyperband in Oríon, you must specify one parameter with fidelity(low, high, base)
as the prior, low
will be ignored, high
will be taken as the maximum resource R
and base
will be taken as the reduction factor eta
.
Number of epochs usually can be used as the resource but the algorithm is generic and can be
applied to any multi-fidelity setting. That is, you can use training time, specifying the
fidelity with --epochs~fidelity(low=1, high=81, base=3)
(assuming your script takes this argument in commandline),
but you could also use other fidelity
such as dataset size --dataset-size~fidelity(low=500, high=50000)
(assuming your script takes this argument and adapt dataset size accordingly).
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
hyperband:
seed: null
repetitions: 1
- class orion.algo.hyperband.Hyperband(space: Space, seed: int | Sequence[int] | None = None, repetitions: int | float = inf)[source]
Hyperband formulates hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.`
For more information on the algorithm, see original paper at http://jmlr.org/papers/v18/16-558.html.
Li, Lisha et al. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization” Journal of Machine Learning Research, 18:1-52, 2018.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- repetitions: int
Number of executions for Hyperband. A single execution of Hyperband takes a finite budget of
(log(R)/log(eta) + 1) * (log(R)/log(eta) + 1) * R
, andrepetitions
allows you to run multiple executions of Hyperband. Default isnumpy.inf
which means to run Hyperband until no new trials can be suggested.
- Attributes
fidelity_index
Compute the dimension name of the space where fidelity is.
ASHA¶
Asynchronous Successive Halving Algorithm, the asynchronous version of Hyperband, can be roughly interpreted as a sophisticated random search that leverages partial information of the trial execution to concentrate resources on the most promising ones.
The main idea of the algorithm is the following. Given a fidelity dimension, such as the number of epochs to train or the size of the dataset, ASHA samples trials with low-fidelity and promotes the most promising ones to the next fidelity level. This makes it possible to only execute one trial with full fidelity, leading to very optimal resource usage.
The most common way of using ASHA is to reduce the number of epochs,
but the algorithm is generic and can be applied to any multi-fidelity setting.
That is, you can use training time, specifying the fidelity with
--epochs~fidelity(low=1, high=100)
(assuming your script takes this argument in commandline),
but you could also use other fidelity
such as dataset size --dataset-size~fidelity(low=500, high=50000)
(assuming your script takes this argument and
adapt dataset size accordingly). The placeholder fidelity(low, high)
is a special prior for
multi-fidelity algorithms.
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
asha:
seed: null
num_rungs: null
num_brackets: 1
repetitions: 1
- class orion.algo.asha.ASHA(space: Space, seed: int | Sequence[int] | None = None, num_rungs: int | None = None, num_brackets: int = 1, repetitions: int | float = inf)[source]
Asynchronous Successive Halving Algorithm
A simple and robust hyperparameter tuning algorithm with solid theoretical underpinnings that exploits parallelism and aggressive early-stopping.
For more information on the algorithm, see original paper at https://arxiv.org/abs/1810.05934.
Li, Liam, et al. “Massively parallel hyperparameter tuning.” arXiv preprint arXiv:1810.05934 (2018)
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- num_rungs: int, optional
Number of rungs for the largest bracket. If not defined, it will be equal to
(base + 1)
of the fidelity dimension. In the original paper,num_rungs == log(fidelity.high/fidelity.low) / log(fidelity.base) + 1
. Default:log(fidelity.high/fidelity.low) / log(fidelity.base) + 1
- num_brackets: int
Using a grace period that is too small may bias ASHA too strongly towards fast converging trials that do not lead to best results at convergence (stagglers). To overcome this, you can increase the number of brackets, which increases the amount of resource required for optimisation but decreases the bias towards stragglers. Default: 1
- repetitions: int
Number of execution of ASHA. Default is np.inf which means to run ASHA until no new trials can be suggested.
BOHB¶
BOHB, is an integration of a Bayesian Optimization algorithm for the selection of hyperparameters to try at the first rung of Hyperband brackets. First batch of Trials will be sampled randomly, but subsequent ones will be selected using Bayesian Optimization. See Hyperband for more information on how to use multi-fidelity algorithms.
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
bohb:
min_points_in_model: 20
top_n_percent: 15
num_samples: 64
random_fraction: 0.33
bandwidth_factor: 3
min_bandwidth": 1e-3
parallel_strategy:
of_type: StatusBasedParallelStrategy
strategy_configs:
broken:
of_type: MaxParallelStrategy
- class orion.algo.bohb.BOHB(space, seed=None, min_points_in_model=None, top_n_percent=15, num_samples=64, random_fraction=0.3333333333333333, bandwidth_factor=3, min_bandwidth=0.001, parallel_strategy=None)[source]
Bayesian Optimization with HyperBand
This class is a wrapper around the library HpBandSter: https://github.com/automl/HpBandSter.
For more information on the algorithm, see original paper at https://arxiv.org/abs/1807.01774.
Falkner, Stefan, Aaron Klein, and Frank Hutter. “BOHB: Robust and efficient hyperparameter optimization at scale.” In International Conference on Machine Learning, pp. 1437-1446. PMLR, 2018.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- min_points_in_model: int
Number of observations to start building a KDE. If
None
, uses number of dimensions in the search space + 1. Default:None
- top_n_percent: int
Percentage ( between 1 and 99) of the observations that are considered good. Default: 15
- num_samples: int
Number of samples to optimize Expected Improvement. Default: 64
- random_fraction: float
Fraction of purely random configurations that are sampled from the prior without the model. Default: 1/3
- bandwidth_factor: float
To encourage diversity, the points proposed to optimize EI, are sampled from a ‘widened’ KDE where the bandwidth is multiplied by this factor. Default: 3
- min_bandwidth: float
To keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth is used instead of zero. Default: 1e-3
- parallel_strategy: dict or None, optional
The configuration of a parallel strategy to use for pending trials or broken trials. Default is a MaxParallelStrategy for broken trials and NoParallelStrategy for pending trials.
- Attributes
- requires_dist
- requires_type
DEHB¶
DEHB, is an integration of a Differential Evolutionary algorithm with Hyperband. While BOHB, uses Bayesian Optimization to select the hyperparameter to try at the first rung of subsequent brackets, DEHB uses Differential Evolution for both selecting the hyperparameters to try at the first rung of subsequent brackets and to mutate best sets of hyperparameters when promoting trials inside a bracket. Trials cannot be resumed after promotion to higher fidelity level with DEHB. DEHB leads to different hyperparameter values and thus different trial ids, for this reason trials cannot be resumed after promotions as for other variants of hyperbands. See Hyperband for more information on how to use multi-fidelity algorithms.
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
dehb:
seed: null
mutation_factor: 0.5
crossover_prob: 0.5
mutation_strategy: rand1
crossover_strategy: bin
boundary_fix_type: random
min_clip: null
max_clip: null
max_age: 10e10
- class orion.algo.dehb.dehb.DEHB(space: Space, seed: int | None = None, mutation_factor: float = 0.5, crossover_prob: float = 0.5, mutation_strategy: str = 'rand1', crossover_strategy: str = 'bin', boundary_fix_type: str = 'random', min_clip: int | None = None, max_clip: int | None = None)[source]
Differential Evolution with HyperBand
This class is a wrapper around the library DEHB: https://github.com/automl/DEHB.
For more information on the algorithm, see original paper at https://arxiv.org/abs/2105.09821.
Awad, Noor, Neeratyoy Mallik, and Frank Hutter. “Dehb: Evolutionary hyperband for scalable, robust and efficient hyperparameter optimization.” arXiv preprint arXiv:2105.09821 (2021).
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- mutation_factor: float
Mutation probability Default:
0.5
- crossover_prob: float
Crossover probability Default:
0.5
- mutation_strategy: str
Mutation strategy rand1, rand2dir randtobest1 currenttobest1 best1 best2 rand2 Default:
'rand1'
- crossover_strategy: str
Crossover strategy bin or exp Default:
'bin'
- boundary_fix_type: str
Boundary fix method, clip or random Default:
'random'
- min_clip: float
Min clip when boundary fix method is clip Default:
None
- max_clip: float
Max clip when boundary fix method is clip Default:
None
- Attributes
- requires_dist
- requires_type
Methods
observe_one
(trial)Observe a single trial
sample_to_trial
(sample, fidelity)Convert a ConfigSpace sample into a trial
Population Based Training (PBT)¶
Warning
PBT was broken in version v0.2.4. Make sure to use the latest release.
Population based training is an evolutionary algorithm that evolve trials from low fidelity levels to high fidelity levels (ex: number of epochs), reusing the model’s parameters along the way. This has the effect of creating hyperparameter schedules through the fidelity levels.
See documentation below for more information on the algorithm and how to use it.
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
pbt:
population_size: 50
generations: 10
fork_timeout: 60
exploit:
of_type: PipelineExploit
exploit_configs:
- of_type: BacktrackExploit
min_forking_population: 5
truncation_quantile: 0.9
candidate_pool_ratio: 0.2
- of_type: TruncateExploit
min_forking_population: 5
truncation_quantile: 0.8
candidate_pool_ratio: 0.2
explore:
of_type: PipelineExplore
explore_configs:
- of_type: ResampleExplore
probability: 0.2
- of_type: PerturbExplore
factor: 1.2
volatility: 0.0001
Note
Notice the additional strategy
in configuration which is not mandatory for most other
algorithms. See StubParallelStrategy for more information.
- class orion.algo.pbt.pbt.PBT(space: Space, seed: int | Sequence[int] | None = None, population_size: int = 50, generations: int = 10, exploit: dict | None = None, explore: dict | None = None, fork_timeout: int = 60)[source]
Population Based Training algorithm
Warning:PBT was broken in version v0.2.4. Make sure to use latest release.
Population based training is an evolutionary algorithm that evolve trials from low fidelity levels to high fidelity levels (ex: number of epochs). For a population of size m, it first samples m trials at lowest fidelity level. When trials are completed, it decides based on the
exploit
configuration whether the trial should be promoted to next fidelity level or whether another trial should be selected instead and forked. When a trial is forked, new hyperparameters are selected based on the trials hyperparameters and theexplore
configuration. The original trial’s working_dir is then copied over to the new trial’s working_dir so that the user script can resume execution from model parameters of original trial.It is important that the weights of models trained for each trial are saved in the corresponding directory at path
trial.working_dir
. The file name does not matter. The entire directory is copied to a newtrial.working_dir
when PBT selects a good model and explore new hyperparameters. The new trial can be resumed by the user by loading the weights found in the freshly copiednew_trial.working_dir
, and saved back at the same path at end of trial execution. To accesstrial.working_dir
from Oríon’s commandline API, see documentation at https://orion.readthedocs.io/en/stable/user/script.html#command-line-templating. To accesstrial.working_dir
from Oríon’s Python API, set argumenttrial_arg="trial"
when executing methodorion.client.experiment.ExperimentClient.workon()
.The number of fidelity levels is determined by the argument
generations
. The lowest and highest fidelity levels, and the distrubition, is determined by the search space’s dimension that will have a priorfidelity(low, high, base)
, wherebase
is the logarithm base of the dimension. Original PBT algorithm uses a base of 1.PBT will try to return as many trials as possible when calling
suggest(num)
, up tonum
. Whenpopulation_size
trials are sampled and more trials are requested, it will try to generate new trials by promoting or forking existing trials in a queue. This queue will get filled when callingobserve(trials)
on completed or broken trials.If trials are broken at lowest fidelity level, they are ignored and will not count in population size so that PBT can sample additional trials to reach
population_size
completed trials at lowest fidelity. If a trial is broken at higher fidelity, the original trial leading to the broken trial is examined again forexploit
andexplore
. If the broken trial was the result of a fork, then we backtrack to the trial that was dropped duringexploit
in favor of the forked trial. If the broken trial was a promotion, then we backtrack to the original trial that was promoted.For more information on the algorithm, see original paper at https://arxiv.org/abs/1711.09846.
Jaderberg, Max, et al. “Population based training of neural networks.” arXiv preprint, arXiv:1711.09846 (2017).
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- population_size: int, optional
Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.
- generations: int, optional
Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10
- exploit: dict or None, optional
Configuration for a
pbt.exploit.BaseExploit
object that determines when if a trial should be exploited or not. If None, default configuration is aPipelineExploit
withBacktrackExploit
andTruncateExploit
.- explore: dict or None, optional
Configuration for a
pbt.explore.BaseExplore
object that returns new parameter values for exploited trials. If None, default configuration is aPipelineExplore
withResampleExplore
andPerturbExplore
.- fork_timeout: int, optional
Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise
SuggestionTimeout
. Default: 60
Notes
It is important that the experiment using this algorithm has a working directory properly set. The experiment’s working dir serve as the base for the trial’s working directories.
The trial’s working directory is
trial.working_dir
. This is where the weights of the model should be saved. Usingtrial.hash_params
to determine a unique working dir for the trial will result in working on a different directory than the one copied by PBT, hence missing the copied model parameters.
Population Based Bandits (PB2)¶
Warning
PB2 was broken in version v0.2.4. Make sure to use the latest release.
Population Based Bandits is a variant of Population Based Training using probabilistic model to guide the search instead of relying on purely random perturbations. PB2 implementation uses a time-varying Gaussian process to model the optimization curves during training. This implementation is based on ray-tune implementation. Oríon’s version supports discrete and categorical dimensions, and offers better resiliency to broken trials by using back-tracking.
See documentation below for more information on the algorithm and how to use it.
Note
Current implementation does not support more than one fidelity dimension.
Configuration¶
experiment:
algorithm:
pb2:
population_size: 50
generations: 10
fork_timeout: 60
exploit:
of_type: PipelineExploit
exploit_configs:
- of_type: BacktrackExploit
min_forking_population: 5
truncation_quantile: 0.9
candidate_pool_ratio: 0.2
- of_type: TruncateExploit
min_forking_population: 5
truncation_quantile: 0.8
candidate_pool_ratio: 0.2
- class orion.algo.pbt.pb2.PB2(space, seed=None, population_size=50, generations=10, exploit=None, fork_timeout=60)[source]
Population Based Bandits
Warning: PB2 is broken in current version v0.2.4. We are working on a fix to be released in v0.2.5, ETA July 2022.
Population Based Bandits is a variant of Population Based Training using probabilistic model to guide the search instead of relying on purely random perturbations. PB2 implementation uses a time-varying Gaussian process to model the optimization curves during training. This implementation is based on ray-tune implementation. Oríon’s version supports discrete and categorical dimensions, and offers better resiliency to broken trials by using back-tracking.
See PBT documentation for more information on how to use PBT algorithms.
For more information on the algorithm, see original paper at https://arxiv.org/abs/2002.02518.
Parker-Holder, Jack, Vu Nguyen, and Stephen J. Roberts. “Provably efficient online hyperparameter optimization with population-based bandits.” Advances in Neural Information Processing Systems 33 (2020): 17200-17211.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- population_size: int, optional
Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.
- generations: int, optional
Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10
- exploit: dict or None, optional
Configuration for a
pbt.exploit.BaseExploit
object that determines when if a trial should be exploited or not. If None, default configuration is aPipelineExploit
withBacktrackExploit
andTruncateExploit
.- fork_timeout: int, optional
Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise
SuggestionTimeout
. Default: 60
TPE¶
Tree-structured Parzen Estimator (TPE) algorithm is one of Sequential Model-Based Global Optimization (SMBO) algorithms, which will build models to propose new points based on the historical observed trials.
Instead of modeling p(y|x) like other SMBO algorithms, TPE models p(x|y) and p(y), and p(x|y) is modeled by transforming that generative process, replacing the distributions of the configuration prior with non-parametric densities.
The TPE defines p(x|y) using two such densities l(x) and g(x) where l(x) is distribution of good points and g(x) is the distribution of bad points. Good and bad points are split from observed points so far with a parameter gamma which defines the ratio of good points. New point candidates will be sampled with l(x) and Expected Improvement (EI) optimization scheme will be used to find the most promising point among the candidates.
Note
Current implementation only supports uniform, loguniform, uniform discrete and choices as prior. As for choices prior, the probabilities if any given will be ignored.
Configuration¶
experiment:
algorithm:
tpe:
seed: null
n_initial_points: 20
n_ei_candidates: 25
gamma: 0.25
equal_weight: False
prior_weight: 1.0
full_weight_num: 25
parallel_strategy:
of_type: StatusBasedParallelStrategy
strategy_configs:
broken:
of_type: MaxParallelStrategy
default_strategy:
of_type: NoParallelStrategy
- class orion.algo.tpe.TPE(space: Space, seed: int | Sequence[int] | None = None, n_initial_points: int = 20, n_ei_candidates: int = 24, gamma: float = 0.25, equal_weight: bool = False, prior_weight: float = 1.0, full_weight_num: int = 25, max_retry: int = 100, parallel_strategy: dict | None = None)[source]
Tree-structured Parzen Estimator (TPE) algorithm is one of Sequential Model-Based Global Optimization (SMBO) algorithms, which will build models to propose new points based on the historical observed trials.
Instead of modeling p(y|x) like other SMBO algorithms, TPE models p(x|y) and p(y), and p(x|y) is modeled by transforming that generative process, replacing the distributions of the configuration prior with non-parametric densities.
The TPE defines p(x|y) using two such densities l(x) and g(x) while l(x) is distribution of good points and g(x) is the distribution of bad points. New point candidates will be sampled with l(x) and Expected Improvement (EI) optimization scheme will be used to find the most promising point among the candidates.
For more information on the algorithm, see original papers at:
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int, optional
Seed to sample initial points and candidates points. Default:
None
- n_initial_points: int, optional
Number of initial points randomly sampled. If new points are requested and less than n_initial_points are observed, the next points will also be sampled randomly instead of being sampled from the parzen estimators. Default:
20
- n_ei_candidates: int, optional
Number of candidates points sampled for ei compute. Larger numbers will lead to more exploitation and lower numbers will lead to more exploration. Be careful with categorical dimension as TPE tend to severily exploit these if n_ei_candidates is larger than 1. Default:
24
- gamma: real, optional
Ratio to split the observed trials into good and bad distributions. Lower numbers will load to more exploitation and larger numbers will lead to more exploration. Default:
0.25
- equal_weight: bool, optional
True to set equal weights for observed points. Default:
False
- prior_weight: int, optional
The weight given to the prior point of the input space. Default:
1.0
- full_weight_num: int, optional
The number of the most recent trials which get the full weight where the others will be applied with a linear ramp from 0 to 1.0. It will only take effect if equal_weight is False.
- max_retry: int, optional
Number of attempts to sample new points if the sampled points were already suggested. Default:
100
- parallel_strategy: dict or None, optional
The configuration of a parallel strategy to use for pending trials or broken trials. Default is a MaxParallelStrategy for broken trials and NoParallelStrategy for pending trials.
Ax¶
Ax is a platform for optimizing any kind of experiment, including machine learning experiments, A/B tests, and simulations. Ax can optimize discrete configurations (e.g., variants of an A/B test) using multi-armed bandit optimization, and continuous (e.g., integer or floating point)-valued configurations using Bayesian optimization.
Configuration¶
experiment:
algorithm:
ax:
seed: 1234
n_initial_trials: 5,
parallel_strategy:
of_type: StatusBasedParallelStrategy
strategy_configs:
broken:
of_type: MaxParallelStrategy
- class orion.algo.axoptimizer.AxOptimizer(space: Space, seed: Optional[int] = None, n_initial_trials: Optional[int] = 20, extra_objectives: Optional[List[str]] = None, constraints: Optional[List[str]] = None)[source]
Wrapper around the Ax platform for multi-objectives optimization and constraints.
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int, optional
random seed for reproducibility. Works only for Sobol quasi-random generator and for BoTorch-powered models. For the latter models, the trials generated from the same optimization setup with the same seed, will be mostly similar, but the exact parameter values may still vary and trials latter in the optimizations will diverge more and more. This is because a degree of randomness is essential for high performance of the Bayesian optimization models and is not controlled by the seed.
Note
In multi-threaded environments, the random seed is thread-safe, but does not actually guarantee reproducibility. Whether the outcomes will be exactly the same for two same operations that use the random seed, depends on whether the threads modify the random state in the same order across the two operations. Default:
None
- n_initial_trials: int, optional
Specific number of initialization trials. Initialization trials are generated quasi-randomly using Sobol.
- extra_objectives: sequence of str, optional
List of metrics’ name which are also objectives to minimize. threshold: The bound in the objective’s threshold constraint.
Note
Orion expects the extra_objectives results to be stored in orion.core.worker.Trial.statistics
- constraints: sequence of str, optional
Dict of list of string representation of metrics constraints of form [“metric_name >= bound”], like [“m1 <= 3”]
Note
Orion expects the constraints results to be stored in orion.core.worker.Trial.constraints
Note
See https://ax.dev/docs/core.html#optimization-config for more details about how Ax expects its outcome constraints
- Attributes
- requires_dist
Methods
Instantiate a new AxClient from previous snapshot
reverse_params
(ax_params, space)Reverse converted choices dimensions values to their original types
transform_params
(orion_params, space)Convert orion parameter values
Evolution-ES¶
Evolution-ES, the evolution algorithm with early stop version. Here is an implementation of Evolution-ES. In the evolution algorithm, we follow the tournament selection algorithm as Large-Scale-Evolution. Tournament selection evolutionary hyper-parameter search is conducted by first defining a gene encoding that describes a hyper-parameter combination, and then creating the initial population by randomly sampling from the space of gene encodings to create individuals, which are trained and assigned fitnesses. The population is then repeatedly sampled from to produce groups, and the parent is selected by the individual with the highest fitness. Selected parents have their gene encodings mutated to produce child models. Individual in the group with the lowest fitness is killed, while the newly evaluated child model is added to the population, taking the killed individual’s place. This process is repeated and results in a population with high fitness individuals can represent the good hyper-parameter combination. Evolution-ES also formulated a method to dynamically allocate resources to more promising individual according to their fitness, which is referred to as Progressive Dynamic Hurdles (PDH), allows individuals that are consistently performing well to train for more steps. It can be roughly interpreted as a sophisticated random search that leverages partial information of the trial execution to concentrate resources on the most promising ones.
The implementation follows the process and use way of Hyperband.
Additionally, The fidelity base in Evolution-ES can be
extended to support fidelity(low, high, base=1)
,
which is the same as linspace(low, high)
.
Configuration¶
experiment:
algorithm:
EvolutionES:
seed: null
repetitions: 1
nums_population: 20
mutate:
function: orion.algo.mutate_functions.default_mutate
multiply_factor: 3.0
add_factor: 1
- class orion.algo.evolution_es.EvolutionES(space: Space, seed: int | Sequence[int] | None = None, repetitions: int | float = inf, nums_population: int = 20, mutate: str | dict | None = None, max_retries: int = 1000)[source]
EvolutionES formulates hyperparameter optimization as an evolution.
For more information on the algorithm, see original paper at https://arxiv.org/pdf/1703.01041.pdf and https://arxiv.org/pdf/1901.11117.pdf
Real et al. “Large-Scale Evolution of Image Classifiers” So et all. “The Evolved Transformer”
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- repetitions: int
Number of execution of Hyperband. Default is numpy.inf which means to run Hyperband until no new trials can be suggested.
- nums_population: int
Number of population for EvolutionES. Larger number of population often gets better performance but causes more computation. So there is a trade-off according to the search space and required budget of your problems. Default: 20
- mutate: str or dict or None, optional
In the mutate part, one can define the customized mutate function with its mutate factors, such as multiply factor (times/divides by a multiply factor) and add factor (add/subtract by a multiply factor). The function must be defined by an importable string. If None, default mutate function is used:
orion.algo.evolution_es.mutate_functions.default_mutate
.
Methods
create_bracket
MOFA¶
The MOdular FActorial Design (MOFA) algorithm is based on factorial design and factorial analysis methods to optmimize hyperparameters. It performs multiple iterations each of which starts with sampling hyperparameter trial values from an orthogonal latin hypercube to cover the search space well while de-correlating hyperparameters. Once all trials in an iteration are returned, MOFA performs factorial analysis to determine which hyperparameters should be fixed in value and which hyperparameters require further exploration. As the hyperparameters become fixed, the number of trials are reduced in subsequent iterations.
Note
MOFA requires Python v3.8 or greater and scipy v1.8 or greater.
Note
Default values for the index
, n_levels
, and strength
parameters are set
to the empirically obtained optimal values described in section 5.2 of the paper.
The strength
parameter must be set to either 1
or 2
.
Note
The number of trials N for a single MOFA iteration is set to N = index * n_levels^strength
.
The --exp-max-trials
should at least be a multiple of N
.
Configuration¶
experiment:
algorithm:
MOFA:
seed: null
index: 1
n_levels: 5
strength: 2
threshold: 0.1
- class orion.algo.mofa.mofa.MOFA(space: Space, seed: int | Sequence[int] | None = None, index: int = 1, n_levels: int = 5, strength: int = 2, threshold: float = 0.1)[source]
MOdular FActorial Design (MOFA).
For more information on the algorithm, see original paper: MOFA: Modular Factorial Design for Hyperparameter Optimization https://arxiv.org/abs/2011.09545
Xiong, Bo, Yimin Huang, Hanrong Ye, Steffen Staab, and Zhenguo Li. “MOFA: Modular Factorial Design for Hyperparameter Optimization.” arXiv preprint arXiv:2011.09545 (2020).
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- index: int, optional
This is the lambda parameter in the paper. Default:
1
- n_levels: int, optional
Number of levels in the orthogonal Latin hypercube (OLH) table. Should be set to a prime number. This is the l parameter in the paper. Default:
5
- strength: int, optional
Strength parameter. This is the t parameter in the paper. Default:
2
- threshold: float, optional
The threshold to determine is a dimension was explored enough and can be fixed. Default: 0.1
Notes
Default values for the index, n_levels, and strength (t) parameter are set to the empirically obtained optimal values described in section 5.2 of the paper.
The number of trials N for a single MOFA iteration is set to
N = index * n_levels^t
. The--exp-max-trials
should be a multiple of N.MOFA requires Python v3.8 or greater and scipy v1.8 or greater.
Nevergrad¶
Nevergrad is a derivative-free optimization platform providing a library of algorithms for hyperparameter search.
experiment:
algorithm:
nevergrad:
seed: null
budget: 1000
num_workers: 10
model_name: NGOpt
- class orion.algo.nevergradoptimizer.NevergradOptimizer(space: Space, model_name: str = 'NGOpt', seed: int | Sequence[int] | None = None, budget: int = 100, num_workers: int = 10)[source]
Wraps the nevergrad library to expose its algorithm to orion
- Parameters
- space: `orion.algo.space.Space`
Optimisation space with priors for each dimension.
- model_name: str
Nevergrad model to use as optimizer
- budget: int
Maximal number of trial to generated
- num_workers: int
Number of worker to use
- seed: None, int or sequence of int
Seed for the random number generator used to sample new trials. Default:
None
- Attributes
- requires_shape
HEBO¶
Warning
HEBO package does not work with numpy>=1.24.0.
Evolutionary algorithms from the HEBO repository are made available in Orion. There are a wide range of configutaion options for these algorithms, including the choice of model, evolutionary strategy, and acquisition function.
Configuration¶
experiment:
algorithm:
hebo:
seed: 1234
parameters:
model_name: catboost
random_samples: 5
acquisition_class: hebo.acquisitions.acq.MACE
evolutionary_strategy: nsga2
model_config: null
- class orion.algo.hebo.hebo_algo.HEBO(space: Space, seed: int | None = None, parameters: Parameters | dict | None = None)[source]
Adapter for the HEBO algorithm from https://github.com/huawei-noah/HEBO
- Parameters
- :param space: Optimisation space with priors for each dimension.
- :param seed: Base seed for the random number generators. Defaults to `None`, in which case the
- randomness is not seeded.
- :param parameters: Parameters for the HEBO algorithm.
- Attributes
- requires_dist
- requires_type
Methods
Parameters
(model_name, random_samples, ...)Parameters of the HEBO algorithm.
Algorithm Plugins¶
Plugins documentation is hosted separately. See short documentations below to find links to full plugins documentation.
Scikit-Optimize¶
This package is a plugin providing a wrapper for skopt optimizers.
For more information, you can find the documentation at orionalgoskopt.readthedocs.io.
Robust Bayesian Optimization¶
This package is a plugin providing a wrapper for RoBO optimizers.
You will find in this plugin many models for Bayesian Optimization: Gaussian Process, Gaussian Process with MCMC, Random Forest, DNGO and BOHAMIANN.
For more information, you can find the documentation at epistimio.github.io/orion.algo.robo.
Parallel Strategies¶
A parallel strategy is a method to improve parallel optimization for sequential algorithms. Such algorithms can only observe trials that are completed and have a corresponding objective. To get around this, parallel strategies produces lies, noncompleted trials with fake objectives, which can be used by algorithms to avoid exploring space nearby pending or broken trials. The strategies will differ in how they assign objectives to the lies.
NoParallelStrategy¶
Does not return any lie. This is useful to benchmark parallel strategies and measure how they can help compared to no strategy.
StubParallelStrategy¶
Assign to lies an objective of None
so that
non-completed trials are observed and identifiable by algorithms
that can leverage parallel optimization.
The value of the objective is customizable with stub_value
.
MaxParallelStrategy¶
Assigns to lies the best objective observed so far.
The default value assigned to objective when less than 1 trial
is completed is configurable with default_result
. It
is float('inf')
by default.
MeanParallelStrategy¶
Assigns to lies the mean of all objectives observed so far.
The default value assigned to objective when less than 2 trials
are completed is configurable with default_result
. It
is float('inf')
by default.
StatusBasedParallelStrategy¶
Uses a different strategy based on the status of the trial at hand.