Algorithms

Default algorithm is a random search based on the probability distribution given to a search parameter’s definition.

Selecting and Configuring

In a Oríon configuration YAML, define:

experiment:
    algorithms:
        gradient_descent:
            learning_rate: 0.1

In this particular example, the name of the algorithm extension class to be imported and instantiated is Gradient_Descent, so the lower-case identifier corresponds to it.

All algorithms have default arguments that should work reasonably well in general. To tune the algorithm for a specific problem, you can set those arguments in the yaml file as shown above with learning_rate.

Included Algorithms

Hyperband

Hyperband extends the SuccessiveHalving algorithm by providing a way to exploit a fixed budget with different number of configurations for SuccessiveHalving algorithm to evaluate. Each run of SuccessiveHalving will be defined as a bracket in Hyperband. Hyperband requires two inputs (1) R, the maximum amount of resource that can be allocated to a single configuration, and (2) eta, an input that controls the proportion of configurations discarded in each round of SuccessiveHalving.

To use Hyperband in Oríon, you must specify one parameter with fidelity(low, high, base) as the prior, low will be ignored, high will be taken as the maximum resource R and base will be taken as the reduction factor eta.

Number of epochs usually can be used as the resource but the algorithm is generic and can be applied to any multi-fidelity setting. That is, you can use training time, specifying the fidelity with --epochs~fidelity(low=1, high=81, base=3) (assuming your script takes this argument in commandline), but you could also use other fidelity such as dataset size --dataset-size~fidelity(low=500, high=50000) (assuming your script takes this argument and adapt dataset size accordingly).

Note

Current implementation does not support more than one fidelity dimension.

Configuration

experiment:
    algorithms:
        hyperband:
            seed: null
            repetitions: 1
class orion.algo.hyperband.Hyperband(space, seed=None, repetitions=inf)[source]

Hyperband formulates hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.`

For more information on the algorithm, see original paper at http://jmlr.org/papers/v18/16-558.html.

Li, Lisha et al. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization” Journal of Machine Learning Research, 18:1-52, 2018.

Parameters
space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int

Seed for the random number generator used to sample new trials. Default: None

repetitions: int

Number of executions for Hyperband. A single execution of Hyperband takes a finite budget of (log(R)/log(eta) + 1) * (log(R)/log(eta) + 1) * R, and repetitions allows you to run multiple executions of Hyperband. Default is numpy.inf which means to run Hyperband until no new trials can be suggested.

ASHA

Asynchronous Successive Halving Algorithm, the asynchronous version of Hyperband, can be roughly interpreted as a sophisticated random search that leverages partial information of the trial execution to concentrate resources on the most promising ones.

The main idea of the algorithm is the following. Given a fidelity dimension, such as the number of epochs to train or the size of the dataset, ASHA samples trials with low-fidelity and promotes the most promising ones to the next fidelity level. This makes it possible to only execute one trial with full fidelity, leading to very optimal resource usage.

The most common way of using ASHA is to reduce the number of epochs, but the algorithm is generic and can be applied to any multi-fidelity setting. That is, you can use training time, specifying the fidelity with --epochs~fidelity(low=1, high=100) (assuming your script takes this argument in commandline), but you could also use other fidelity such as dataset size --dataset-size~fidelity(low=500, high=50000) (assuming your script takes this argument and adapt dataset size accordingly). The placeholder fidelity(low, high) is a special prior for multi-fidelity algorithms.

Note

Current implementation does not support more than one fidelity dimension.

Configuration

experiment:
    algorithms:
        asha:
            seed: null
            num_rungs: null
            num_brackets: 1
            repetitions: 1
class orion.algo.asha.ASHA(space, seed=None, num_rungs=None, num_brackets=1, repetitions=inf)[source]

Asynchronous Successive Halving Algorithm

A simple and robust hyperparameter tuning algorithm with solid theoretical underpinnings that exploits parallelism and aggressive early-stopping.

For more information on the algorithm, see original paper at https://arxiv.org/abs/1810.05934.

Li, Liam, et al. “Massively parallel hyperparameter tuning.” arXiv preprint arXiv:1810.05934 (2018)

Parameters
space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int

Seed for the random number generator used to sample new trials. Default: None

num_rungs: int, optional

Number of rungs for the largest bracket. If not defined, it will be equal to (base + 1) of the fidelity dimension. In the original paper, num_rungs == log(fidelity.high/fidelity.low) / log(fidelity.base) + 1. Default: log(fidelity.high/fidelity.low) / log(fidelity.base) + 1

num_brackets: int

Using a grace period that is too small may bias ASHA too strongly towards fast converging trials that do not lead to best results at convergence (stagglers). To overcome this, you can increase the number of brackets, which increases the amount of resource required for optimisation but decreases the bias towards stragglers. Default: 1

repetitions: int

Number of execution of ASHA. Default is numpy.inf which means to run ASHA until no new trials can be suggested.

Population Based Training (PBT)

Population based training is an evolutionary algorithm that evolve trials from low fidelity levels to high fidelity levels (ex: number of epochs), reusing the model’s parameters along the way. This has the effect of creating hyperparameter schedules through the fidelity levels.

See documentation below for more information on the algorithm and how to use it.

Note

Current implementation does not support more than one fidelity dimension.

Configuration

experiment:

  strategy: StubParallelStrategy

  algorithms:
    pbt:
      population_size: 50
      generations: 10
      fork_timeout: 60
      exploit:
        of_type: PipelineExploit
        exploit_configs:
          - of_type: BacktrackExploit
            min_forking_population: 5
            truncation_quantile: 0.9
            candidate_pool_ratio: 0.2
          - of_type: TruncateExploit
            min_forking_population: 5
            truncation_quantile: 0.8
            candidate_pool_ratio: 0.2
       explore:
         of_type: PipelineExplore
         explore_configs:
           - of_type: ResampleExplore
             probability: 0.2
           - of_type: PerturbExplore
             factor: 1.2
             volatility: 0.0001

Note

Notice the additional strategy in configuration which is not mandatory for most other algorithms. See StubParallelStrategy for more information.

class orion.algo.pbt.pbt.PBT(space, seed=None, population_size=50, generations=10, exploit=None, explore=None, fork_timeout=60)[source]

Population Based Training algorithm

Population based training is an evolutionary algorithm that evolve trials from low fidelity levels to high fidelity levels (ex: number of epochs). For a population of size m, it first samples m trials at lowest fidelity level. When trials are completed, it decides based on the exploit configuration whether the trial should be promoted to next fidelity level or whether another trial should be selected instead and forked. When a trial is forked, new hyperparameters are selected based on the trials hyperparameters and the explore configuration. The original trial’s working_dir is then copied over to the new trial’s working_dir so that the user script can resume execution from model parameters of original trial.

It is important that the weights of models trained for each trial are saved in the corresponding directory at path trial.working_dir. The file name does not matter. The entire directory is copied to a new trial.working_dir when PBT selects a good model and explore new hyperparameters. The new trial can be resumed by the user by loading the weigths found in the freshly copied new_trial.working_dir, and saved back at the same path at end of trial execution. To access trial.working_dir from Oríon’s commandline API, see documentation at https://orion.readthedocs.io/en/stable/user/script.html#command-line-templating. To access trial.working_dir from Oríon’s Python API, set argument trial_arg="trial" when executing method orion.client.experiment.ExperimentClient.workon().

The number of fidelity levels is determined by the argument generations. The lowest and highest fidelity levels, and the distrubition, is determined by the search space’s dimension that will have a prior fidelity(low, high, base), where base is the logarithm base of the dimension. Original PBT algorithm uses a base of 1.

PBT will try to return as many trials as possible when calling suggest(num), up to num. When population_size trials are sampled and more trials are requested, it will try to generate new trials by promoting or forking existing trials in a queue. This queue will get filled when calling observe(trials) on completed or broken trials.

If trials are broken at lowest fidelity level, they are ignored and will not count in population size so that PBT can sample additional trials to reach population_size completed trials at lowest fidelity. If a trial is broken at higher fidelity, the original trial leading to the broken trial is examinated again for exploit and explore. If the broken trial was the result of a fork, then we backtrack to the trial that was dropped during exploit in favor of the forked trial. If the broken trial was a promotion, then we backtrack to the original trial that was promoted.

For more information on the algorithm, see original paper at https://arxiv.org/abs/1711.09846.

Jaderberg, Max, et al. “Population based training of neural networks.” arXiv preprint, arXiv:1711.09846 (2017).

Parameters
space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int

Seed for the random number generator used to sample new trials. Default: None

population_size: int, optional

Size of the population. No trial will be continued until there are population_size trials executed until lowest fidelity. If a trial is broken during execution at lowest fidelity, the algorithm will sample a new trial, keeping the population of non-broken trials at population_size. For efficiency it is better to have less workers running than population_size. Default: 50.

generations: int, optional

Number of generations, from lowest fidelity to highest one. This will determine how many branchings occur during the execution of PBT. Default: 10

exploit: dict or None, optional

Configuration for a pbt.exploit.BaseExploit object that determines when if a trial should be exploited or not. If None, default configuration is a PipelineExploit with BacktrackExploit and TruncateExploit.

explore: dict or None, optional

Configuration for a pbt.explore.BaseExplore object that returns new parameter values for exploited trials. If None, default configuration is a PipelineExplore with ResampleExplore and PerturbExplore.

fork_timeout: int, optional

Maximum amount of time in seconds that an attempt to mutate a trial should take, otherwise algorithm.suggest() will raise SuggestionTimeout. Default: 60

Notes

It is important that the experiment using this algorithm has a working directory properly set. The experiment’s working dir serve as the base for the trial’s working directories.

The trial’s working directory is trial.working_dir. This is where the weights of the model should be saved. Using trial.hash_params to determine a unique working dir for the trial will result in working on a different directory than the one copied by PBT, hence missing the copied model parameters.

TPE

Tree-structured Parzen Estimator (TPE) algorithm is one of Sequential Model-Based Global Optimization (SMBO) algorithms, which will build models to propose new points based on the historical observed trials.

Instead of modeling p(y|x) like other SMBO algorithms, TPE models p(x|y) and p(y), and p(x|y) is modeled by transforming that generative process, replacing the distributions of the configuration prior with non-parametric densities.

The TPE defines p(x|y) using two such densities l(x) and g(x) where l(x) is distribution of good points and g(x) is the distribution of bad points. Good and bad points are split from observed points so far with a parameter gamma which defines the ratio of good points. New point candidates will be sampled with l(x) and Expected Improvement (EI) optimization scheme will be used to find the most promising point among the candidates.

Note

Current implementation only supports uniform, loguniform, uniform discrete and choices as prior. As for choices prior, the probabilities if any given will be ignored.

Configuration

experiment:
    algorithms:
        tpe:
            seed: null
            n_initial_points: 20
            n_ei_candidates: 25
            gamma: 0.25
            equal_weight: False
            prior_weight: 1.0
            full_weight_num: 25
            parallel_strategy:
                of_type: StatusBasedParallelStrategy
                strategy_configs:
                    broken:
                        of_type: MaxParallelStrategy
                default_strategy:
                    of_type: NoParallelStrategy
class orion.algo.tpe.TPE(space, seed=None, n_initial_points=20, n_ei_candidates=24, gamma=0.25, equal_weight=False, prior_weight=1.0, full_weight_num=25, max_retry=100, parallel_strategy=None)[source]

Tree-structured Parzen Estimator (TPE) algorithm is one of Sequential Model-Based Global Optimization (SMBO) algorithms, which will build models to propose new points based on the historical observed trials.

Instead of modeling p(y|x) like other SMBO algorithms, TPE models p(x|y) and p(y), and p(x|y) is modeled by transforming that generative process, replacing the distributions of the configuration prior with non-parametric densities.

The TPE defines p(x|y) using two such densities l(x) and g(x) while l(x) is distribution of good points and g(x) is the distribution of bad points. New point candidates will be sampled with l(x) and Expected Improvement (EI) optimization scheme will be used to find the most promising point among the candidates.

For more information on the algorithm, see original papers at:

Parameters
space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int, optional

Seed to sample initial points and candidates points. Default: None

n_initial_points: int, optional

Number of initial points randomly sampled. If new points are requested and less than n_initial_points are observed, the next points will also be sampled randomly instead of being sampled from the parzen estimators. Default: 20

n_ei_candidates: int, optional

Number of candidates points sampled for ei compute. Larger numbers will lead to more exploitation and lower numbers will lead to more exploration. Be carefull with categorical dimension as TPE tend to severily exploit these if n_ei_candidates is larger than 1. Default: 24

gamma: real, optional

Ratio to split the observed trials into good and bad distributions. Lower numbers will load to more exploitation and larger numbers will lead to more exploration. Default: 0.25

equal_weight: bool, optional

True to set equal weights for observed points. Default: False

prior_weight: int, optional

The weight given to the prior point of the input space. Default: 1.0

full_weight_num: int, optional

The number of the most recent trials which get the full weight where the others will be applied with a linear ramp from 0 to 1.0. It will only take effect if equal_weight is False.

max_retry: int, optional

Number of attempts to sample new points if the sampled points were already suggested. Default: 100

parallel_strategy: dict or None, optional

The configuration of a parallel strategy to use for pending trials or broken trials. Default is a MaxParallelStrategy for broken trials and NoParallelStrategy for pending trials.

Evolution-ES

Evolution-ES, the evolution algorithm with early stop version. Here is an implementation of Evolution-ES. In the evolution algorithm, we follow the tournament selection algorithm as Large-Scale-Evolution. Tournament selection evolutionary hyper-parameter search is conducted by first defining a gene encoding that describes a hyper-parameter combination, and then creating the initial population by randomly sampling from the space of gene encodings to create individuals, which are trained and assigned fitnesses. The population is then repeatedly sampled from to produce groups, and the parent is selected by the individual with the highest fitness. Selected parents have their gene encodings mutated to produce child models. Individual in the group with the lowest fitness is killed, while the newly evaluated child model is added to the population, taking the killed individual’s place. This process is repeated and results in a population with high fitness individuals can represent the good hyper-parameter combination. Evolution-ES also formulated a method to dynamically allocate resources to more promising individual according to their fitness, which is referred to as Progressive Dynamic Hurdles (PDH), allows individuals that are consistently performing well to train for more steps. It can be roughly interpreted as a sophisticated random search that leverages partial information of the trial execution to concentrate resources on the most promising ones.

The implementation follows the process and use way of Hyperband. Additionally, The fidelity base in Evolution-ES can be extended to support fidelity(low, high, base=1), which is the same as linspace(low, high).

Configuration

experiment:
    algorithms:
        EvolutionES:
            seed: null
            repetitions: 1
            nums_population: 20
            mutate:
                function: orion.algo.mutate_functions.default_mutate
                multiply_factor: 3.0
                add_factor: 1
class orion.algo.evolution_es.EvolutionES(space, seed=None, repetitions=inf, nums_population=20, mutate=None, max_retries=1000)[source]

EvolutionES formulates hyperparameter optimization as an evolution.

For more information on the algorithm, see original paper at https://arxiv.org/pdf/1703.01041.pdf and https://arxiv.org/pdf/1901.11117.pdf

Real et al. “Large-Scale Evolution of Image Classifiers” So et all. “The Evolved Transformer”

Parameters
space: `orion.algo.space.Space`

Optimisation space with priors for each dimension.

seed: None, int or sequence of int

Seed for the random number generator used to sample new trials. Default: None

repetitions: int

Number of execution of Hyperband. Default is numpy.inf which means to run Hyperband until no new trials can be suggested.

nums_population: int

Number of population for EvolutionES. Larger number of population often gets better performance but causes more computation. So there is a trade-off according to the search space and required budget of your problems. Default: 20

mutate: str or None, optional

In the mutate part, one can define the customized mutate function with its mutate factors, such as multiply factor (times/divides by a multiply factor) and add factor (add/subtract by a multiply factor). The function must be defined by an importable string. If None, default mutate function is used: orion.algo.mutate_functions.default_mutate.

Algorithm Plugins

Plugins documentation is hosted separately. See short documentations below to find links to full plugins documentation.

Scikit-Optimize

This package is a plugin providing a wrapper for skopt optimizers.

For more information, you can find the documentation at orionalgoskopt.readthedocs.io.

Robust Bayesian Optimization

This package is a plugin providing a wrapper for RoBO optimizers.

You will find in this plugin many models for Bayesian Optimization: Gaussian Process, Gaussian Process with MCMC, Random Forest, DNGO and BOHAMIANN.

For more information, you can find the documentation at epistimio.github.io/orion.algo.robo.

Parallel Strategies

A parallel strategy is a method to improve parallel optimization for sequential algorithms. Such algorithms can only observe trials that are completed and have a corresponding objective. To get around this, parallel strategies produces lies, noncompleted trials with fake objectives, which can be used by algorithms to avoid exploring space nearby pending or broken trials. The strategies will differ in how they assign objectives to the lies.

NoParallelStrategy

Does not return any lie. This is useful to benchmark parallel strategies and measure how they can help compared to no strategy.

class orion.algo.parallel_strategy.NoParallelStrategy(*args, **kwargs)[source]

No parallel strategy

StubParallelStrategy

Assign to lies an objective of None so that non-completed trials are observed and identifiable by algorithms that can leverage parallel optimization.

The value of the objective is customizable with stub_value.

class orion.algo.parallel_strategy.StubParallelStrategy(stub_value=None)[source]

Parallel strategy that returns static objective value for incompleted trials.

MaxParallelStrategy

Assigns to lies the best objective observed so far.

The default value assigned to objective when less than 1 trial is completed is configurable with default_result. It is float('inf') by default.

class orion.algo.parallel_strategy.MaxParallelStrategy(default_result=inf)[source]

Parallel strategy that uses the max of completed objectives

Attributes
max_result

MeanParallelStrategy

Assigns to lies the mean of all objectives observed so far.

The default value assigned to objective when less than 2 trials are completed is configurable with default_result. It is float('inf') by default.

class orion.algo.parallel_strategy.MeanParallelStrategy(default_result=inf)[source]

Parallel strategy that uses the mean of completed objectives

Attributes
mean_result

StatusBasedParallelStrategy

Uses a different strategy based on the status of the trial at hand.

class orion.algo.parallel_strategy.StatusBasedParallelStrategy(strategy_configs=None, default_strategy=None)[source]

Different parallel strategies for different trial status

Parameters
strategy_configs: dict

Dictionary of strategy configurations. Each key should be a valid trial status.

default_strategy: dict or None, optional

Default strategy for trial status that are not defined by strategy_configs. Default is NoParallelStrategy(), which always returns None.