Optuna

Optuna is a python package for tuning hyperparameters in ML. It can be integrated into such as Pytorch, Tensorflow, Keras, Scikit-learn models (more see).

In this post, I will introduce the basic workflow and give a panorama figure of Optuna. Trivial and intricate functions are remaining for interested and agog readers.

Main components

There are four components in the Optuna package:

  • Hyperparameters
  • Objectives
  • Study
  • trial

Hyperparameters

Hyperparameters are a set of parameters with different search scopes, such as the learning rate, batch size, etc. We want to find the best combination of them which achieves the optimal objective value.

Objectives

Objectives are performance criterions of potential hyperparameter combinations, such as accuracy, test loss, etc.

Trial

Trial is an instance of possible combinations where you set hyperparameters with targeted values and test and record the corresponding performance in objectives.

Study

Study is a set of trials where records all trails you have executed and the best hyperparameter combination.

Workflow

Given the set and search scopes of hyperparameters, how to choose targets values in each trial?

With a relatively small search space, enumeration may be a possible solution. However, if the parameter is continuous, how to efficiently pick values?

Fortunately, Optuna provides a Sampler class to efficiently search the space and generate possible combinations for each trial. The complicated algorithm behinds the sampling can see Optuna: efficient optimization algorithms.

Actually, if we are not researchers in this field, we don’t have to know how these sampling algorithms work, why they are efficient, etc. How to pick a suitable algorithm is more important.

In all sampling algorithms, there are two sampling types:

  • Relative Sampling. In the relative sampling, there are relations among parameters and we determine values of multiple parameters simultaneously.
  • Independent Sampling. In the independent sampling, parameters are independent and we determine a value of a single parameter.

The following figure illustrates the mechanism behinds Optuna.


The corresponding source code is in the Trial._suggest method.

  1. At the beginning, A Study object creates a Trial object. When initiating the Trial object, the constructor function invokes the pre-defined Sampler object to determine the relative search space and carry out the relative sampling which returns parameter values of related parameters.
  2. During test performance in a trial, Trial provides suggest methods to pick values from a parameters sequence which is generated by Simpler based on a distribution (uniform, etc. see).
  3. Repeat above two steps until the pre-defined trial number is achieved or error occurs. In the trial sequences, the optimization algorithm in the Sampler will narrow down the search space and improve efficiency.

An Example of Sampler: GridSearch

At last, I will analyze the source code of ‘GridSampler’ a derived class of Sampler.

There are three main functions in this class:

  • __init__
  • sample_relative
  • sample_independent

__init__

The most important part in the constructor is the _all_grids attribute.

def __init__(
        self, search_space: Mapping[str, Sequence[GridValueType]], seed: Optional[int] = None
    ) -> None:

   

    self._search_space = collections.OrderedDict()
    for param_name, param_values in sorted(search_space.items()):
        self._search_space[param_name] = list(param_values)

    self._all_grids = list(itertools.product(*self._search_space.values()))
    self._param_names = sorted(search_space.keys())
    self._n_min_trials = len(self._all_grids)
    self._rng = np.random.RandomState(seed)

The _all_grids attribute contains the Cartesian product of parameters in the GridSampler.

sample_relative

The sample_relative function firstly collects all unvisited grid, randomly choices one from the unvisited list and registers it as an attribute of the Study instance.

def sample_relative(
    self, study: Study, trial: FrozenTrial, search_space: Dict[str, BaseDistribution]
) -> Dict[str, Any]:
    target_grids = self._get_unvisited_grid_ids(study)

    grid_id = int(self._rng.choice(target_grids))

    study._storage.set_trial_system_attr(trial._trial_id, "search_space", self._search_space)
    study._storage.set_trial_system_attr(trial._trial_id, "grid_id", grid_id)

    return {}

sample_independent

The sample_independent randomly draws a grid_id from the study instance and reads the correspoding value based on the input param_name.

def sample_independent(
    self,
    study: Study,
    trial: FrozenTrial,
    param_name: str,
    param_distribution: BaseDistribution,
) -> Any:


    grid_id = trial.system_attrs["grid_id"]
    param_value = self._all_grids[grid_id][self._param_names.index(param_name)]
    return param_value