Advanced Tabular Model Options

Modified on Tue, 18 Jul, 2023 at 2:20 PM

No model comparison
Cross Validation
- Cross validation options
- Cross validation process
Cross Validation and Hyperparameter Optimisation
How the models apply these settings

All tabular models provide Advanced Options. For most models you will find three options in the advanced options panel (see last section for exceptions):

No model comparison

This is the default setting for each model if you don’t change the advanced settings. With this option selected, the advanced options expose all available hyperparameters and their default values (unless you already changed the values). You can change any hyperparameter and train the model with those changes applied.

If this option is enabled one single model with the given set of hyperparameters will be trained. The training will be done on the entire dataset selected in the model’s Data field.

Cross Validation

If you select Cross Validation the behaviour regarding the hyperparameters is the same as described above. One single set of hyperparameters will be considered in model training.

Cross validation options

If cross validation is selected another set of options is displayed at the bottom of the Advanced Options panel. Refer to this article on Cross Validation on the meaning of these options.

Cross Validation Splitting Strategy	Two options are available: K-Fold (default setting) or Group K-Fold. Refer to this article to learn about these.
Column to group K-fold CV on	Only available if Group K-Fold was selected. Select one of the columns of your dataset by which the K-Fold should be grouped.
Number of Folds	The default value is 5 which is a reasonable choice in most cases (`K = 5`).
Cross-validation scoring metric	The cross validation is done based on this metric. Available options are: Root Mean Squared Error (default) Mean Squared Error Mean Absolute Error Max Error Explained Variance

Cross validation process

Once all parameters are set as required and you click Apply, cross validation runs the following process:

The dataset will be split into K random subsets.
The model will be trained K times with a different train-test-splits of the folds each time. Each time one fold is back for testing and this will be a different one for each of the K trainings.
For each of the K training runs a cross validation score is calculated on the unseen test data.
The K models won’t be stored and published. Only the cross validation score of these models will be stored and made available for other steps. You can use the function Model Evaluation to view the cross validation scores of a model with cross validation enabled.
After the cross validation was done the model will be trained one more time on the entire dataset. This is the model which is made available by the function. Compared to No model comparison you get the same model as the final model training for this final step is identical to model training in that mode.

Cross Validation and Hyperparameter Optimisation

I you select Hyperparameter Optimisation (HPO) the options panel will change. Instead of single values you can select a list of values for each hyperparameter. Each parameter is already pre-filled with default settings.

Single values for hyperparameters

List of values for hyperparameters

Hyperparameter optimisation options

In the example above you see the settings for a Neural Network. We don’t discuss the specific hyperparameters of that model here but just the general options and how to set up the hyperparameter optimisation.

Search Method

Two methods are available to run the optimisation.

Randomised Search: This is the faster method and the default setting. From all possible hyperparameter value combinations N random combinations are chosen and evaluated. The best of those combinations will be returned as final model.; You might not find the global optimum with this approach. But as it is much faster than the exhaustive search you could run the HPO several times. If the resulting model layout is similar each time that would indicate the kind of optimal model layout for your specific use case.
Exhaustive Search: This method will try all possible combinations of values. Therefore, it is very time consuming and might not be possible at all due to memory restrictions. If you actually want to do an exhaustive search you might want to limit the possible combinations to reduce the time and memory consumption.

Number of models to compare

This parameter only shows up for Randomised Search and defines the total number of parameter combinations which are evaluated. The default value is 10 which is a reasonable choice in most cases. The higher the value the longer the HPO takes to finish and the higher the memory consumption of the step.

How to define the hyperparameter space

Each hyperparameter is already filled with predefined values which present typical and reasonable choices. You can change the list of selected values.

Click on the selection field.
A list of pre-defined values with checkboxes on their left side will appear in a separate dialogue box. Mark a checkbox if you want to include a value. It will appear in the list of values.
If you want to add a value other that the pre-defined options you can use the field Or add a custom value at the bottom of the box.

You can remove any value from the list by clicking the small x besides each number.

On the right side of each hyperparameter field there is a label indicating if a parameter is considered in grid search or not. As soon as more than one value is selected a green tick appears.

If only one value is selected a grey x-mark will appear instead. This value will be kept constant in the hyperparameter optimisation process.

Hyperparameter optimisation process

Hyperparameter optimisation entails always cross validation. Cross validation is coming with reasonable default settings (see section Cross Validation above). Unless you want to change any of those settings you can click Apply and the model training starts. The following steps are applied:

For each of the N parameter combinations a cross validation is performed. This means for a specific set of parameters K models are trained (according to the number of folds selected for cross validation).
This results in K cross validation scores for each parameter combination. From those an average cross validation score is calculated.
After cross validation is done for all N models (that is, N × K model trainings), all average cross validation scores are compared and the model with the smallest score is selected. All other models and their cross validation scores are rejected and can’t be accessed after the step completed.
The model with the best (=smallest) score will be finally trained on the entire training dataset and made available in the notebook. The hyperparameter setting of the model is printed in an info box.

How the models apply these settings

For these models all options described above are available:

Nearest Neighbor Regression
Neural Network
Polynomial Regression
Decision Tree Regression
Random Forest Regression
Support Vector Regression

The following models don’t have any hyperparameters which could be optimised. Therefore, only cross validation is available as option:

Gaussian Processes Regression
Linear Regression

The Advanced Options panel looks different for these two models:

There is only a single checkbox to Enable Cross Validation. If the option is disabled the model mode is equivalent to No model comparison. If the option is enabled the same cross validation options are displayed as shown above.