Local Parameter Importance

Hint

Conveys a very compact measure of the importance of the different hyperparameters to achieve the best objective found so far.

The local parameter importance measures the variance of the results when varying one hyperparameter and keeping all other fixes [Biedenkapp2018].

Given a best set of hyperparameters, we separately build a grid for each hyperparameter and compute the variance of the results when keeping all other hyperparameter fixed to their values in the best set. In order to infer these results, we train a regression model from scikit-learn (by default RandomForestRegressor) on the trial history of the experiment, and use it to predict the objective. The ratio of variance for one hyperparameter versus the sum of variances for all hyperparameters is used as the local parameter importance metric.

orion.plotting.base.lpi(experiment, with_evc_tree=True, model='RandomForestRegressor', model_kwargs=None, n_points=20, n_runs=10, **kwargs)[source]

Make a bar plot to visualize the local parameter importance metric.

For more information on the metric, see original paper at https://ml.informatik.uni-freiburg.de/papers/18-LION12-CAVE.pdf.

Biedenkapp, André, et al. “Cave: Configuration assessment, visualization and evaluation.” International Conference on Learning and Intelligent Optimization. Springer, Cham, 2018.

Parameters
experiment: ExperimentClient or Experiment

The orion object containing the experiment data

with_evc_tree: bool, optional

Fetch all trials from the EVC tree. Default: True

model: str

Name of the regression model to use. Can be one of - AdaBoostRegressor - BaggingRegressor - ExtraTreesRegressor - GradientBoostingRegressor - RandomForestRegressor (Default)

Arguments for the regressor model.

model_kwargs: dict

Arguments for the regressor model.

n: int

Number of points to compute the variances. Default is 20.

kwargs: dict

All other plotting keyword arguments to be passed to plotly.express.line.

Returns
plotly.graph_objects.Figure
Raises
ValueError

If no experiment is provided or if regressor name is invalid.

The local parameter importance plot can be executed directly from the experiment with plot.lpi() as shown in the example below.

from orion.client import get_experiment

# Specify the database where the experiments are stored. We use a local PickleDB here.
storage = dict(type="legacy", database=dict(type="pickleddb", host="../db.pkl"))

# Load the data for the specified experiment
experiment = get_experiment("2-dim-exp", storage=storage)
fig = experiment.plot.lpi()
fig

Out:

/home/docs/checkouts/readthedocs.org/user_builds/orion/checkouts/stable/src/orion/analysis/base.py:173: VisibleDeprecationWarning:

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.


On this plot the x-axis shows the different hyperparameters while the y-axis gives the local parameter importance. The error bars represent the standard deviation of the LPI based on 10 runs. Remember that the LPI requires the training of a regression model. The initial state of this model can greatly influence the results. This is why the computation of the LPI is done multiple times (10 times by default) using different random seeds. Here is an example setting the number of points for the grids, the number of runs and the initial random seed for the regression model.

experiment.plot.lpi(n_points=10, n_runs=5, model_kwargs=dict(random_state=1))

Out:

/home/docs/checkouts/readthedocs.org/user_builds/orion/checkouts/stable/src/orion/analysis/base.py:173: VisibleDeprecationWarning:

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.


We can see that the learning rate had a larger impact than the dropout in achieving the best objective. A search space of only 2 dimensions is easy to analyse visually however so the LPI has little additional value in this example. We need to use an example with a larger search space to better show to utility of the LPI. We will load results from tutorial Checkpointing trials for this.

# Load the data for the specified experiment
experiment = get_experiment("hyperband-cifar10", storage=storage)
experiment.plot.lpi()


There is a large difference here between the most important hyperparameter (learning rate) and the least important one (gamma).

One caveat of LPI is that the variance for each hyperparameters depends on the search space. If the prior for one hyperparameter is narrow and fits the region of best values for this hyperparameter, then the variance will be low and this hyperparameter will be considered non-important. It may be important, but it is not important to optimize it within this narrow search space. Another related issue, is that if one hyperparameter have a dramatic effect, it will lead to a variance so large that the other hyperparameters will seem unrelevant in comparison. This is what we observe here with the learning rate. If we branch from the experiment and define a narrowed prior for the learning rate, we will see that it becomes an unimportant hyperparameter. See documentation on Experiment Version Control for more information on branching, or orion.client.build_experiment() for informations on branching arguments. Original learning rate prior was loguniform(1e-5, 0.1). We will narrow it to loguniform(1e-3, 0.1).

from orion.client import build_experiment

# Branch from "hyperband-cifar10" with a narrower search space.
experiment = build_experiment(
    "narrow-hyperband-cifar10",
    branching={"branch_from": "hyperband-cifar10"},
    space={
        "epochs": "fidelity(1, 120, base=4)",
        "learning_rate": "loguniform(1e-3, 0.1)",
        "momentum": "uniform(0, 0.9)",
        "weight_decay": "loguniform(1e-10, 1e-2)",
        "gamma": "loguniform(0.97, 1)",
    },
    storage=storage,
)

experiment.plot.lpi()

Out:

Running experiment in a different state:
Remaining conflicts:

     learning_rate~loguniform(1e-05, 0.1) != learning_rate~loguniform(0.001, 0.1)
     Experiment name 'hyperband-cifar10' already exist with version '1'
     v0.1.9.rc.post209.dev0+g0605286c != v0.1.17.post0.dev0+g06839449


The prior of the learning rate is arguably large, spanning over 3 orders of magnitude (0.001, 0.1). Nevertheless, for this problem, most learning rates within this range leads to optimal results whenever the other hyperparameters are optimal. What you must remember is that defining to narrow search spaces may lead to misleading local parameter importance. See Partial Dependencies for a visualization to verify if the search space you defined may be too narrow.

Special cases

Logarithmic scale

Dimensions with a logarithmic prior loguniform(low, high) are linearized before being passed to the regression model (using log(dim) instead of dim directly). This means the model is trained and will be making predictions in the linearized space.

Dimension with shape

If some dimensions have a shape larger than 1, they will be flattened so that each subdimension can be represented in the bar plot.

# Load the data for the specified experiment
experiment = get_experiment("2-dim-shape-exp", storage=storage)
experiment.plot.lpi()


In the example above, the dimension learning_rate~loguniform(1e-5, 1e-2, shape=3) is flattened and represented with learning_rate[i]. If the shape would be or more dimensions (ex: (3, 2)), the indices would be learning_rate[i,j] with i=0..2 and j=0..1.

Categorical dimension

Categorical dimensions are converted into integer values, so that the regression model can handle them. The integers are simply indices that are assigned to each category in arbitrary order. Here is an example where dimension mt-join has the prior choices(['mean', 'max', 'concat']).

# Load the data for the specified experiment
experiment = get_experiment("3-dim-cat-shape-exp", storage=storage)
experiment.plot.lpi()


Finally we save the image to serve as a thumbnail for this example. See the guide How to save for more information on image saving.

fig.write_image("../../docs/src/_static/lpi_thumbnail.png")
Biedenkapp2018

Biedenkapp, André, et al. “Cave: Configuration assessment, visualization and evaluation.” International Conference on Learning and Intelligent Optimization. Springer, Cham, 2018.

Total running time of the script: ( 0 minutes 8.125 seconds)

Gallery generated by Sphinx-Gallery