Regression Example

This example will show you how to use ConfOpt to optimize hyperparameters for a regression task.

If you already used hyperparameter tuning packages, the “Code Example” section below will give you a quick run through of how to use ConfOpt. If not, don’t worry, the “Detailed Walkthrough” section will explain everything step-by-step.

Code Example

Set up search space and objective function:

from confopt.tuning import ConformalTuner
from confopt.wrapping import IntRange, FloatRange, CategoricalRange
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

search_space = {
    'n_estimators': IntRange(min_value=50, max_value=200),
    'max_depth': IntRange(min_value=3, max_value=15),
    'min_samples_split': IntRange(min_value=2, max_value=10)
}

def objective_function(configuration):
    X, y = load_diabetes(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )

    model = RandomForestRegressor(
        n_estimators=configuration['n_estimators'],
        max_depth=configuration['max_depth'],
        min_samples_split=configuration['min_samples_split'],
        random_state=42
    )

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    return mse  # Lower is better (minimize MSE)

Call ConfOpt to tune hyperparameters:

tuner = ConformalTuner(
    objective_function=objective_function,
    search_space=search_space,
    minimize=True  # Minimizing MSE
)

tuner.tune(
    max_searches=50,
    n_random_searches=10,
    verbose=True
)

Extract results:

best_params = tuner.get_best_params()
best_mse = tuner.get_best_value()

tuned_model = RandomForestRegressor(**best_params, random_state=42)

Detailed Walkthrough

Imports

First, let’s import everything we’ll be needing:

from confopt.tuning import ConformalTuner
from confopt.wrapping import IntRange, FloatRange, CategoricalRange

from sklearn.ensemble import RandomForestRegressor

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

For this tutorial, we’ll be using the sklearn Diabetes dataset and trying to tune the hyperparameters of a RandomForestRegressor.

Search Space

Next, we need to define the hyperparameter space we want confopt to optimize over.

This is done using the IntRange, FloatRange, and CategoricalRange classes, which specify the ranges for each hyperparameter.

Below let’s define a simple example with a few typical hyperparameters for regression:

search_space = {
    'n_estimators': IntRange(min_value=50, max_value=200),
    'max_depth': IntRange(min_value=3, max_value=15),
    'min_samples_split': IntRange(min_value=2, max_value=10)
}

This tells confopt to explore the following hyperparameter ranges:

n_estimators: Number of trees in the forest (all integer values from 50 to 200)
max_depth: Maximum tree depth (all integer values from 3 to 15)
min_samples_split: Minimum samples to split a node (all integer values from 2 to 10)

Objective Function

The objective function defines how the model trains and what metric you want to optimize for during hyperparameter search:

def objective_function(configuration):
    X, y = load_diabetes(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )

    model = RandomForestRegressor(
        n_estimators=configuration['n_estimators'],
        max_depth=configuration['max_depth'],
        min_samples_split=configuration['min_samples_split'],
        random_state=42
    )

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    return mse  # Lower is better (minimize MSE)

The objective function must take a single argument called configuration, which is a dictionary containing a value for each hyperparameter name specified in your search_space. The values will be chosen automatically by the tuner during optimization.

The score can be any metric of your choosing (e.g., MSE, R², MAE, etc.). This is the value that confopt will try to optimize for. For MSE, lower is better, so we minimize it.

In this example, the data is loaded and split inside the objective function for simplicity, but you may prefer to load the data outside (to avoid reloading it for each configuration) and either pass the training and test sets as arguments using partial from the functools library, or reference them from the global scope.

Running the Optimization

To start optimizing, first instantiate a ConformalTuner by providing your objective function, search space, and the optimization direction:

tuner = ConformalTuner(
    objective_function=objective_function,
    search_space=search_space,
    minimize=True  # Minimizing MSE
)

The minimize parameter should be set to True to minimize metrics where lower is better (e.g., MSE, MAE), or False to maximize metrics where higher is better (e.g., R²).

To actually kickstart the hyperparameter search, call:

tuner.tune(
    max_searches=50,
    n_random_searches=10,
    verbose=True
)

Where:

max_searches controls how many different hyperparameter configurations will be tried in total.
n_random_searches sets how many of those will be chosen randomly before the tuner switches to using smart optimization (e.g., max_searches=50 and n_random_searches=10 means the tuner will sample 10 random configurations, then 40 smart configurations).

Getting the Results

After that runs, you can retrieve the best hyperparameters or the best score found respectively using get_best_params() and get_best_value():

best_params = tuner.get_best_params()
best_mse = tuner.get_best_value()

Which you can use to instantiate a tuned version of your model:

tuned_model = RandomForestRegressor(**best_params, random_state=42)