Classification Example
This example will show you how to use ConfOpt to optimize hyperparameters for a classification task.
If you already used hyperparameter tuning packages, the “Code Example” section below will give you a quick run through of how to use ConfOpt. If not, don’t worry, the “Detailed Walkthrough” section will explain everything step-by-step.
Code Example
Set up search space and objective function:
from confopt.tuning import ConformalTuner
from confopt.wrapping import IntRange, FloatRange, CategoricalRange
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
search_space = {
'n_estimators': IntRange(min_value=50, max_value=200),
'max_features': FloatRange(min_value=0.1, max_value=1.0),
'criterion': CategoricalRange(choices=['gini', 'entropy', 'log_loss'])
}
def objective_function(configuration):
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
model = RandomForestClassifier(
n_estimators=configuration['n_estimators'],
max_features=configuration['max_features'],
criterion=configuration['criterion'],
random_state=42
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
return score
Call ConfOpt to tune hyperparameters:
tuner = ConformalTuner(
objective_function=objective_function,
search_space=search_space,
minimize=False
)
tuner.tune(
max_searches=50,
n_random_searches=10,
verbose=True
)
Extract results:
best_params = tuner.get_best_params()
best_accuracy = tuner.get_best_value()
tuned_model = RandomForestClassifier(**best_params, random_state=42)
Detailed Walkthrough
Imports
First, let’s import everything we’ll be needing:
from confopt.tuning import ConformalTuner
from confopt.wrapping import IntRange, FloatRange, CategoricalRange
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
For this tutorial, we’ll be using the sklearn Wine dataset and trying to tune the hyperparameters of a RandomForestClassifier.
Search Space
Next, we need to define the hyperparameter space we want confopt to optimize over.
This is done using the IntRange, FloatRange, and CategoricalRange classes, which specify the ranges for each hyperparameter. Below let’s define a simple example with one of each type of hyperparameter:
search_space = {
'n_estimators': IntRange(min_value=50, max_value=200),
'max_features': FloatRange(min_value=0.1, max_value=1.0),
'criterion': CategoricalRange(choices=['gini', 'entropy', 'log_loss'])
}
This tells confopt to explore the following hyperparameter ranges:
n_estimators: Number of trees in the forest (all integer values from 50 to 200)max_features: Fraction of features to consider at each split (any float between 0.1 and 1.0)criterion: Function to measure the quality of a split (choose from ‘gini’, ‘entropy’, or ‘log_loss’)
Objective Function
The objective function defines how the model trains and what metric you want to optimize for during hyperparameter search:
def objective_function(configuration):
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
model = RandomForestClassifier(
n_estimators=configuration['n_estimators'],
max_features=configuration['max_features'],
criterion=configuration['criterion'],
random_state=42
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
return score
The objective function must take a single argument called configuration, which is a dictionary containing a hyperparameter value for each hyperparameter name specified in your search_space. The values will be chosen automatically by the tuner during optimization.
The score can be any metric of your choosing (e.g., accuracy, log loss, F1 score, etc.). This is the value that confopt will try to optimize for.
In this example, the data is loaded and split inside the objective function for simplicity, but you may prefer to load the data outside (to avoid reloading it for each configuration) and
either pass the training and test sets as arguments using partial from the functools library, or reference them from the global scope.
Running the Optimization
To start optimizing, first instantiate a ConformalTuner by providing your objective function, search space, and the optimization direction:
tuner = ConformalTuner(
objective_function=objective_function,
search_space=search_space,
minimize=False # Use True for metrics like log loss
)
The minimize parameter should be set to False if you want to maximize your metric (e.g., accuracy), or True if you want to minimize it (e.g., log loss).
To actually kickstart the hyperparameter search, call:
tuner.tune(
max_searches=50,
n_random_searches=10,
verbose=True
)
Where:
max_searchescontrols how many different hyperparameter configurations will be tried in total.n_random_searchessets how many of those will be chosen randomly before the tuner switches to using smart optimization (eg.max_searches=50andn_random_searches=10means the tuner will sample 10 random configurations, then 40 smart configurations).
Getting the Results
After that runs, you can retrieve the best hyperparameters or the best score found respectively using get_best_params() and get_best_value():
best_params = tuner.get_best_params()
best_accuracy = tuner.get_best_value()
Which you can use to instantiate a tuned version of your model:
tuned_model = RandomForestClassifier(**best_params, random_state=42)