Partial Dependencies

Hint

Conveys a broad overview of the search space and what has been explored during the experiment. Helps identifying best optimal regions of the space.

The partial dependency computes the average predicted performance with respect to a set of hyperparameters, marginalizing out the other hyperparameters.

To predict the performance on unseen set of hyperparameters, we train a regression model on available trial history. We build a grid of g points for a hyperparameter of interest, or a 2-D grid of g ^ 2 points for a pair of hyperparameters. We sample a group of n set of hyperparameters from the entire space to marginalize over the other hyperparameters. For each value of the grid, we compute the prediction of the regression model on all points of the group, with the hyperparameter of interest set to a value of the grid. For instance, for a 1-D grid of g points and a group of n points, we compute g * n predictions.

For a search space of d hyperparameters, the partial dependency plot is organized as a matrix of (d, d) subplots. The subplots on the diagonal show the partial dependency of each hyperparameters separately, while the subplots below the diagonal show the partial dependency between two hyperparameters. Let’s look at a simple example to make it more concrete.

orion.plotting.base.partial_dependencies(experiment, with_evc_tree=True, params=None, smoothing=0.85, verbose_hover=True, n_grid_points=10, n_samples=50, colorscale='Blues', model='RandomForestRegressor', model_kwargs=None)[source]

Make countour plots to visualize the search space of each combination of params.

Parameters
experiment: ExperimentClient or Experiment

The orion object containing the experiment data

with_evc_tree: bool, optional

Fetch all trials from the EVC tree. Default: True

params: list of str, optional

Indicates the parameters to include in the plots. All parameters are included by default.

smoothing: float, optional

Smoothing applied to the countor plot. 0 corresponds to no smoothing. Default is 0.85.

verbose_hover: bool

Indicates whether to display the hyperparameter in hover tooltips. True by default.

colorscale: str, optional

The colorscale used for the contour plots. Supported values depends on the backend. Default is ‘Blues’.

n_grid_points: int, optional

Number of points in the grid to compute partial dependency. Default is 10.

n_samples: int, optinal

Number of samples to randomly generate the grid used to compute the partial dependency. Default is 50.

model: str

Name of the regression model to use. Can be one of - AdaBoostRegressor - BaggingRegressor - ExtraTreesRegressor - GradientBoostingRegressor - RandomForestRegressor (Default)

model_kwargs: dict, optional

Arguments for the regressor model.

Returns
plotly.graph_objects.Figure
Raises
ValueError

If no experiment is provided.

The partial dependencies plot can be executed directly from the experiment with plot.partial_dependencies() as shown in the example below.

from orion.client import get_experiment

# Specify the database where the experiments are stored. We use a local PickleDB here.
storage = dict(type="legacy", database=dict(type="pickleddb", host="../db.pkl"))

# Load the data for the specified experiment
experiment = get_experiment("2-dim-exp", storage=storage)
fig = experiment.plot.partial_dependencies()
fig

Out:

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.


For the plots on the diagonal, the y-axis is the objective and the x-axis is the value of the corresponding hyperparameter. For the contour plots below the diagonal, the y-axis and x-axis are the values of the corresponding hyperparameters labelled on the left and at the bottom. The objective is represented as a color gradient in the contour plots. The light blue area in the plots on the diagonal represents the standard deviation of the predicted objective when varying the other hyperparameters over the search space. The black dots represents the trials in the current history of the experiment. If you hover your cursor over one dot, you will see the configuration of the corresponding trial following this format:

ID: <trial id>
value: <objective>
time: <completed time>
parameters
  <name>: <value>

Even for a simple 2-d search space, the partial dependency is very useful. We see very cleary in this example the optimal regions for both hyperparameters and we can see as well that the optimal region for learning rates is larger when the dropout is low, and narrower when dropout approaches 0.5.

Todo

Make one toy example where two HPs are dependent.

Options

Params

The simple example involved only 2 hyperparameters, but typical search spaces can be much larger. The partial dependency plot becomes hard to read with more than 3-5 hyperparameters dependency on the size of your screen. With a fix width like in this documentation, 5 hyperparameters are impossible to read as you can see below. (Data coming from tutorial Checkpointing trials)

experiment = get_experiment("hyperband-cifar10", storage=storage)
experiment.plot.partial_dependencies()

Out:

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.


You can select the hyperparameters to show with the argument params.

experiment.plot.partial_dependencies(params=["gamma", "learning_rate"])


Grid resolution

The grid used for the partial dependency can be more or less coarse. Coarser grids will be faster to compute.

import time

experiment = get_experiment("2-dim-exp", storage=storage)
start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_grid_points=5)
print(time.perf_counter() - start, "seconds to compute")
fig

Out:

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.
0.5729105249999975 seconds to compute


With more points the grid is finer.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_grid_points=50)
print(time.perf_counter() - start, "seconds to compute")
fig

Out:

21.856610058999877 seconds to compute


Dependency approximation

By default, the hyperparameters are marginalized over 50 points. This may be suitable for a small 2-D search space but likely unsufficient for 5 dimensions or more. Here is an example with only 5 samples.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_samples=5)
print(time.perf_counter() - start, "seconds to compute")
fig

Out:

1.1984557260000201 seconds to compute


And now with 200 samples.

start = time.perf_counter()
fig = experiment.plot.partial_dependencies(n_samples=200)
print(time.perf_counter() - start, "seconds to compute")
fig

Out:

1.5375449099999514 seconds to compute


Special cases

Logarithmic scale

Dimensions with a logarithmic prior loguniform(low, high) are linearized before being passed to the regression model (using log(dim) instead of dim directly). This means the model is trained and will be making predictions in the linearized space. The data is presented in logarithmic scale to the user, with the axis adjusted to log scale as well.

Fidelity

The fidelity is considered as a logarithmic scale as well. See above for more information on how it is handled.

Dimension with shape

If some dimensions have a shape larger than 1, they will be flattened so that each subdimension can be represented separately in the subplots.

# Load the data for the specified experiment
experiment = get_experiment("2-dim-shape-exp", storage=storage)
experiment.plot.partial_dependencies()

Out:

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.


In the example above, the dimension learning_rate~loguniform(1e-5, 1e-2, shape=3) is flattened and represented with learning_rate[i]. If the shape would be or more dimensions (ex: (3, 2)), the indices would be learning_rate[i,j] with i=0..2 and j=0..1.

The flattened hyperparameters can be fully selected with params=['<name>'].

experiment.plot.partial_dependencies(params=["/learning_rate"])


Or a subset of the flattened hyperparameters can be selected with params=['<name>[index]'].

experiment.plot.partial_dependencies(params=["/learning_rate[0]", "/learning_rate[1]"])


Categorical dimension

Categorical dimensions are converted into integer values, so that the regression model can handle them. The integers are simply indices that are assigned to each category in arbitrary order. Here is an example where dimension mt-join has the prior choices(['mean', 'max', 'concat']).

# Load the data for the specified experiment
experiment = get_experiment("3-dim-cat-shape-exp", storage=storage)
experiment.plot.partial_dependencies(params=["/learning_rate[0]", "/mt-join"])

Out:

`strategy` option is not supported anymore. It should be set in algorithm configuration directly.


Note

For now categorical are plotted using a scatter plot, but a heat-map would be more readable. Contributions are more than welcome! :)

Finally we save the image to serve as a thumbnail for this example. See the guide How to save for more information on image saving.

fig.write_image("../../docs/src/_static/par_dep_thumbnail.png")

Total running time of the script: ( 0 minutes 49.918 seconds)

Gallery generated by Sphinx-Gallery