Scikit-learn

In this tutorial, we’re going to demonstrate how Oríon can be integrated to a minimal model using scikit-learn on the iris dataset. The files mentioned in this tutorial are available at examples/scikitlearn-iris/ in Oríon’s repository.

The requirements are listed in requirements.txt. You can quickly install them using $ pip install -r requirements.txt. If you haven’t installed Oríon previously, make sure to configure it properly before going further.

Sample script

import sys

from sklearn.linear_model import SGDClassifier
from sklearn.metrics import balanced_accuracy_score
from sklearn.model_selection import train_test_split

from orion.client import report_objective

# Parsing the value for the hyper-parameter 'epsilon' given as a command line argument.
hyper_epsilon = sys.argv[1]
print(f"Epsilon is {hyper_epsilon}")

# Loading the iris dataset and splitting it into training and testing set.
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Training the model with the training set with the specified 'epsilon' to control the huber loss.
clf = SGDClassifier(loss="huber", epsilon=float(hyper_epsilon))
clf.fit(X_train, y_train)

# Evaluating the accuracy using the testing set.
y_pred = clf.predict(X_test)
accuracy = balanced_accuracy_score(y_test, y_pred)

# Reporting the results

This very basic script takes in parameter one positional argument for the hyper-parameter epsilon which control the loss in the script.

The script is divided in five parts:

  1. Parsing of the script arguments

  2. Loading and splitting the dataset

  3. Training a classifier using the researcher-defined epsilon

  4. Evaluating the classifier using the testing set

  5. Reporting the performance of the model. i.e., the accuracy.

Note

The workflow presented in the script is simplified on purpose compared to real ones. The objective of this example is to illustrate the basic steps involved in using Oríon.

To find a good epsilon, a user would run empirically $ python main.py <epsilon> multiple times, choosing a new value for epsilon manually.

This ad-hoc hyper-parameter optimization is unreliable, slow, and requires a lot of work from the user. Oríon solves this problem by providing established hyper-parameter optimization algorithms without disrupting the workflow of the user. Integrating it only require minimal adjustments to your current workflow as we’ll demonstrate in the next section.

Enter Orion

Integrating Oríon into your workflow requires only two non-invasive changes:
  1. Define an objective to optimize.

  2. Specify the hyper-parameter space.

For the former, this step takes place in the script training the model. The latter can either be specified in a configuration file or directly while calling the script with Oríon. For the purpose of the example, we’ll configure the hyper-parameter space directly as a command-line argument.

Updating the script

We only need to make one small change to the script: we report to Oríon the objective that we want to minimize at the end of the script using orion.client.report_objective():


report_objective(1 - accuracy)

In our example, we measure the accuracy of the model to qualify its performance. To get the best accuracy possible, we need to minimize the difference between 1 and the accuracy to get it as close to 1 as possible. Otherwise, we’ll be minimizing the accuracy which will yield a poor model.

orion.client.report_objective() can be imported using :

from orion.client import report_objective

Updating the script call

The last missing piece in automating the hyper-parameter optimization of our example model is to supply Oríon with the values to use for epsilon.

We specify the search space in the command line using orion~loguniform(1e-5, 1.0) as the argument for espilon. This argument will tell Oríon to use a log uniform distribution between 1e-5 and 1 for the values of epsilon.

Putting everything together, we need to call main.py with Oríon. The syntax is the following: $ orion hunt python main.py 'orion~loguniform(1e-5, 1.0)'. Before executing it on your terminal, you have to specify the name of the experiment using the -n option. It is also a good idea to specify a stopping condition using --max-trials otherwise the optimization will not stop unless you interrupt it with ctrl-c:

$ orion hunt -n scitkit-iris-tutorial --max-trials 50 python main.py 'orion~loguniform(1e-5, 1.0)'

Warning

Make sure you installed the dependencies for the script before running it using pip install -r requirements.txt.

Viewing the results

Once the optimization reached its stopping condition, you can query Oríon to give you the results of the optimization with the sub-command $ orion info:

$ orion info -n scitkit-iris-tutorial

You can also query the results from the database using Oríon’s python API. Check it out to learn more and see examples.