Note

Go to the end to download the full example code

Partial Dependencies¶

Hint

Conveys a broad overview of the search space and what has been explored during the experiment. Helps identifying best optimal regions of the space.

The partial dependency computes the average predicted performance with respect to a set of hyperparameters, marginalizing out the other hyperparameters.

To predict the performance on unseen set of hyperparameters, we train a regression model on available trial history. We build a grid of g points for a hyperparameter of interest, or a 2-D grid of g ^ 2 points for a pair of hyperparameters. We sample a group of n set of hyperparameters from the entire space to marginalize over the other hyperparameters. For each value of the grid, we compute the prediction of the regression model on all points of the group, with the hyperparameter of interest set to a value of the grid. For instance, for a 1-D grid of g points and a group of n points, we compute g * n predictions.

For a search space of d hyperparameters, the partial dependency plot is organized as a matrix of (d, d) subplots. The subplots on the diagonal show the partial dependency of each hyperparameters separately, while the subplots below the diagonal show the partial dependency between two hyperparameters. Let’s look at a simple example to make it more concrete.

orion.plotting.base.partial_dependencies(experiment, with_evc_tree=True, params=None, smoothing=0.85, verbose_hover=True, n_grid_points=10, n_samples=50, colorscale='Blues', model='RandomForestRegressor', model_kwargs=None)[source]

Make contour plots to visualize the search space of each combination of params.

Parameters

experiment: ExperimentClient or Experiment: The orion object containing the experiment data
with_evc_tree: bool, optional: Fetch all trials from the EVC tree. Default: True
params: list of str, optional: Indicates the parameters to include in the plots. All parameters are included by default.
smoothing: float, optional: Smoothing applied to the countor plot. 0 corresponds to no smoothing. Default is 0.85.
verbose_hover: bool: Indicates whether to display the hyperparameter in hover tooltips. True by default.
colorscale: str, optional: The colorscale used for the contour plots. Supported values depends on the backend. Default is ‘Blues’.
n_grid_points: int, optional: Number of points in the grid to compute partial dependency. Default is 10.
n_samples: int, optional: Number of samples to randomly generate the grid used to compute the partial dependency. Default is 50.
model: str: Name of the regression model to use. Can be one of - AdaBoostRegressor - BaggingRegressor - ExtraTreesRegressor - GradientBoostingRegressor - RandomForestRegressor (Default)
model_kwargs: dict, optional: Arguments for the regressor model.

Returns

plotly.graph_objects.Figure

Raises

ValueError: If no experiment is provided.

The partial dependencies plot can be executed directly from the experiment with plot.partial_dependencies() as shown in the example below.

from orion.client import get_experiment

# flake8: noqa

# Specify the database where the experiments are stored. We use a local PickleDB here.
storage = dict(type="legacy", database=dict(type="pickleddb", host="../db.pkl"))

# Load the data for the specified experiment
experiment = get_experiment("2-dim-exp", storage=storage)
fig = experiment.plot.partial_dependencies()
fig

algorithms is deprecated and will be removed in v0.4.0. Use algorithm instead.
`strategy` option is not supported anymore. It should be set in algorithm configuration directly.

For the plots on the diagonal, the y-axis is the objective and the x-axis is the value of the corresponding hyperparameter. For the contour plots below the diagonal, the y-axis and x-axis are the values of the corresponding hyperparameters labelled on the left and at the bottom. The objective is represented as a color gradient in the contour plots. The light blue area in the plots on the diagonal represents the standard deviation of the predicted objective when varying the other hyperparameters over the search space. The black dots represents the trials in the current history of the experiment. If you hover your cursor over one dot, you will see the configuration of the corresponding trial following this format:

ID: <trial id>
value: <objective>
time: <completed time>
parameters
  <name>: <value>

Even for a simple 2-d search space, the partial dependency is very useful. We see very clearly in this example the optimal regions for both hyperparameters and we can see as well that the optimal region for learning rates is larger when the dropout is low, and narrower when dropout approaches 0.5.

Todo

Make one toy example where two HPs are dependent.

Options¶

Params¶

The simple example involved only 2 hyperparameters, but typical search spaces can be much larger. The partial dependency plot becomes hard to read with more than 3-5 hyperparameters dependency on the size of your screen. With a fix width like in this documentation, 5 hyperparameters are impossible to read as you can see below. (Data coming from tutorial Checkpointing trials)

experiment = get_experiment("hyperband-cifar10", storage=storage)
experiment.plot.partial_dependencies()

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.

You can select the hyperparameters to show with the argument params.

experiment.plot.partial_dependencies(params=["gamma", "learning_rate"])

Grid resolution¶

The grid used for the partial dependency can be more or less coarse. Coarser grids will be faster to compute.

import time

experiment = get_experiment("2-dim-exp", storage=storage)
start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_grid_points=5)
print(time.perf_counter() - start, "seconds to compute")
fig

algorithms is deprecated and will be removed in v0.4.0. Use algorithm instead.
`strategy` option is not supported anymore. It should be set in algorithm configuration directly.
0.4272324909998133 seconds to compute

With more points the grid is finer.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_grid_points=50)
print(time.perf_counter() - start, "seconds to compute")
fig

12.373629239000365 seconds to compute

Dependency approximation¶

By default, the hyperparameters are marginalized over 50 points. This may be suitable for a small 2-D search space but likely insufficient for 5 dimensions or more. Here is an example with only 5 samples.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_samples=5)
print(time.perf_counter() - start, "seconds to compute")
fig

0.6986110210000334 seconds to compute

And now with 200 samples.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_samples=200)
print(time.perf_counter() - start, "seconds to compute")
fig

0.9950102750008227 seconds to compute

Special cases¶

Logarithmic scale¶

Dimensions with a logarithmic prior loguniform(low, high) are linearized before being passed to the regression model (using log(dim) instead of dim directly). This means the model is trained and will be making predictions in the linearized space. The data is presented in logarithmic scale to the user, with the axis adjusted to log scale as well.

Fidelity¶

The fidelity is considered as a logarithmic scale as well. See above for more information on how it is handled.

Dimension with shape¶

If some dimensions have a shape larger than 1, they will be flattened so that each subdimension can be represented separately in the subplots.

# Load the data for the specified experiment
experiment = get_experiment("2-dim-shape-exp", storage=storage)
experiment.plot.partial_dependencies()

algorithms is deprecated and will be removed in v0.4.0. Use algorithm instead.
`strategy` option is not supported anymore. It should be set in algorithm configuration directly.

In the example above, the dimension learning_rate~loguniform(1e-5, 1e-2, shape=3) is flattened and represented with learning_rate[i]. If the shape would be or more dimensions (ex: (3, 2)), the indices would be learning_rate[i,j] with i=0..2 and j=0..1.

The flattened hyperparameters can be fully selected with params=['<name>'].

experiment.plot.partial_dependencies(params=["/learning_rate"])

Or a subset of the flattened hyperparameters can be selected with params=['<name>[index]'].

experiment.plot.partial_dependencies(params=["/learning_rate[0]", "/learning_rate[1]"])

Categorical dimension¶

Categorical dimensions are converted into integer values, so that the regression model can handle them. The integers are simply indices that are assigned to each category in arbitrary order. Here is an example where dimension mt-join has the prior choices(['mean', 'max', 'concat']).

# Load the data for the specified experiment
experiment = get_experiment("3-dim-cat-shape-exp", storage=storage)
experiment.plot.partial_dependencies(params=["/learning_rate[0]", "/mt-join"])

algorithms is deprecated and will be removed in v0.4.0. Use algorithm instead.
`strategy` option is not supported anymore. It should be set in algorithm configuration directly.

Note

For now categorical are plotted using a scatter plot, but a heat-map would be more readable. Contributions are more than welcome! :)

Finally we save the image to serve as a thumbnail for this example. See the guide How to save for more information on image saving.

fig.write_image("../../docs/src/_static/par_dep_thumbnail.png")

Total running time of the script: ( 0 minutes 32.019 seconds)

Gallery generated by Sphinx-Gallery