Analysis

Provides HPO analysis tools

orion.analysis.average(trials, group_by='order', key='best', return_var=False)[source]

Compute the average of some trial attribute.

By default it will compute the average objective at each time step across multiple experiments.

Parameters
trials: DataFrame

A dataframe of trials containing, at least, the columns ‘best’ and ‘order’.

group_by: str, optional

The attribute to use to group trials for the average. By default it group trials by order (ex: all first trials across experiments.)

key: str, optional

One attribute or a list of attributes split by ‘,’ to average. Defaults to ‘best’ as returned by orion.analysis.regret.

return_var: bool, optional

If True, and a column ‘{key}_var’ where ‘{key}’ is the value of the argument key. Defaults to False.

Returns
A dataframe with columns ‘order’, ‘{key}_mean’ and ‘{key}_var’.
orion.analysis.lpi(trials, space, mode='best', model='RandomForestRegressor', n_points=20, n_runs=10, **kwargs)[source]

Calculates the Local Parameter Importance for a collection of orion.core.worker.trial.Trial.

For more information on the metric, see original paper at https://ml.informatik.uni-freiburg.de/papers/18-LION12-CAVE.pdf.

Biedenkapp, André, et al. “Cave: Configuration assessment, visualization and evaluation.” International Conference on Learning and Intelligent Optimization. Springer, Cham, 2018.

Parameters
trials: DataFrame or dict

A dataframe of trials containing, at least, the columns ‘objective’ and ‘id’. Or a dict equivalent.

space: Space object

A space object from an experiment.

mode: str

Mode to compute the LPI. - best: Take the best trial found as the anchor for the LPI - linear: Recompute LPI for all values on a grid

model: str

Name of the regression model to use. Can be one of - AdaBoostRegressor - BaggingRegressor - ExtraTreesRegressor - GradientBoostingRegressor - RandomForestRegressor (Default)

n_points: int

Number of points to compute the variances. Default is 20.

n_runs: int

Number of runs to compute the standard error of the LPI. Default is 10.

``**kwargs``

Arguments for the regressor model.

Returns
DataFrame

LPI value for each parameter. If mode is linear, then a list of param values and LPI metrics are returned in a DataFrame format.

orion.analysis.partial_dependency(trials, space, params=None, model='RandomForestRegressor', n_grid_points=10, n_samples=50, **kwargs)[source]

Calculates the partial dependency of parameters in a collection of orion.core.worker.trial.Trial.

Parameters
trials: DataFrame or dict

A dataframe of trials containing, at least, the columns ‘objective’ and ‘id’. Or a dict equivalent.

space: Space object

A space object from an experiment.

params: list of str, optional

The parameters to include in the computation. All parameters are included by default.

model: str

Name of the regression model to use. Can be one of - AdaBoostRegressor - BaggingRegressor - ExtraTreesRegressor - GradientBoostingRegressor - RandomForestRegressor (Default)

n_grid_points: int

Number of points in the grid to compute partial dependency. Default is 10.

n_samples: int

Number of samples to randomly generate the grid used to compute the partial dependency. Default is 50.

**kwargs

Arguments for the regressor model.

Returns
dict

Dictionary of DataFrames. Each combination of parameters as keys (dim1.name, dim2.name) and for each parameters individually (dim1.name). Columns are (dim1.name, dim2.name, objective) or (dim1.name, objective).

orion.analysis.ranking(trials, group_by='order', key='best')[source]

Compute the ranking of some trial attribute.

By default it will compute the ranking with respect to objectives at each time step across multiple experiments.

Parameters
trials: DataFrame

A dataframe of trials containing, at least, the columns ‘best’ and ‘order’.

group_by: str, optional

The attribute to use to group trials for the ranking. By default it group trials by order (ex: all first trials across experiments.)

key: str, optional

The attribute to use for the ranking. Defaults to ‘best’ as returned by orion.analysis.regret.

Returns
A copy of the original dataframe with a new column ‘rank’ for the rankings.
orion.analysis.regret(trials, names=('best', 'best_id'))[source]

Calculates the regret for a collection of orion.core.worker.trial.Trial. The regret is calculated sequentially from the order of the collection.

Parameters
trials: DataFrame or dict

A dataframe of trials containing, at least, the columns ‘objective’ and ‘id’. Or a dict equivalent.

names:

A tuple containing the names of the columns. Default is (‘best’, ‘best-id’).

Returns
A copy of the original dataframe with two new columns containing respectively the best value
so far and its trial id.