TABLE OF CONTENTS
- No model comparison
- Cross Validation
- Cross Validation and Hyperparameter Optimisation
- How the models apply these settings
All tabular models provide Advanced Options. For most models you will find three options in the advanced options panel (see last section for exceptions):
No model comparison
This is the default setting for each model if you don’t change the advanced settings. With this option selected, the advanced options expose all available hyperparameters and their default values (unless you already changed the values). You can change any hyperparameter and train the model with those changes applied.
If this option is enabled one single model with the given set of hyperparameters will be trained. The training will be done on the entire dataset selected in the model’s Data field.
Cross Validation
If you select Cross Validation the behaviour regarding the hyperparameters is the same as described above. One single set of hyperparameters will be considered in model training.
Cross validation options
If cross validation is selected another set of options is displayed at the bottom of the Advanced Options panel. Refer to this article on Cross Validation on the meaning of these options.
Cross Validation Splitting Strategy | Two options are available: K-Fold (default setting) or Group K-Fold. Refer to this article to learn about these. |
Column to group K-fold CV on | Only available if Group K-Fold was selected. Select one of the columns of your dataset by which the K-Fold should be grouped. |
Number of Folds | The default value is 5 which is a reasonable choice in most cases ( |
Cross-validation scoring metric | The cross validation is done based on this metric. Available options are:
|
Cross validation process
Once all parameters are set as required and you click Apply, cross validation runs the following process:
- The dataset will be split into K random subsets.
- The model will be trained K times with a different train-test-splits of the folds each time. Each time one fold is back for testing and this will be a different one for each of the K trainings.
- For each of the K training runs a cross validation score is calculated on the unseen test data.
- The K models won’t be stored and published. Only the cross validation score of these models will be stored and made available for other steps. You can use the function Model Evaluation to view the cross validation scores of a model with cross validation enabled.
- After the cross validation was done the model will be trained one more time on the entire dataset. This is the model which is made available by the function. Compared to No model comparison you get the same model as the final model training for this final step is identical to model training in that mode.
Cross Validation and Hyperparameter Optimisation
I you select Hyperparameter Optimisation (HPO) the options panel will change. Instead of single values you can select a list of values for each hyperparameter. Each parameter is already pre-filled with default settings.
Single values for hyperparameters | List of values for hyperparameters |
Hyperparameter optimisation options
In the example above you see the settings for a Neural Network. We don’t discuss the specific hyperparameters of that model here but just the general options and how to set up the hyperparameter optimisation.
Search Method | Two methods are available to run the optimisation.
|
Number of models to compare | This parameter only shows up for Randomised Search and defines the total number of parameter combinations which are evaluated. The default value is 10 which is a reasonable choice in most cases. The higher the value the longer the HPO takes to finish and the higher the memory consumption of the step. |
How to define the hyperparameter space
Each hyperparameter is already filled with predefined values which present typical and reasonable choices. You can change the list of selected values.
- Click on the selection field.
- A list of pre-defined values with checkboxes on their left side will appear in a separate dialogue box. Mark a checkbox if you want to include a value. It will appear in the list of values.
- If you want to add a value other that the pre-defined options you can use the field Or add a custom value at the bottom of the box.
- You can remove any value from the list by clicking the small
x
besides each number.
On the right side of each hyperparameter field there is a label indicating if a parameter is considered in grid search or not. As soon as more than one value is selected a green tick appears.
If only one value is selected a grey x-mark will appear instead. This value will be kept constant in the hyperparameter optimisation process.
Hyperparameter optimisation process
Hyperparameter optimisation entails always cross validation. Cross validation is coming with reasonable default settings (see section Cross Validation above). Unless you want to change any of those settings you can click Apply and the model training starts. The following steps are applied:
- For each of the N parameter combinations a cross validation is performed. This means for a specific set of parameters K models are trained (according to the number of folds selected for cross validation).
- This results in K cross validation scores for each parameter combination. From those an average cross validation score is calculated.
- After cross validation is done for all N models (that is,
N × K
model trainings), all average cross validation scores are compared and the model with the smallest score is selected. All other models and their cross validation scores are rejected and can’t be accessed after the step completed. - The model with the best (=smallest) score will be finally trained on the entire training dataset and made available in the notebook. The hyperparameter setting of the model is printed in an info box.
How the models apply these settings
For these models all options described above are available:
- Nearest Neighbor Regression
- Neural Network
- Polynomial Regression
- Decision Tree Regression
- Random Forest Regression
- Support Vector Regression
The following models don’t have any hyperparameters which could be optimised. Therefore, only cross validation is available as option:
- Gaussian Processes Regression
- Linear Regression
The Advanced Options panel looks different for these two models:
There is only a single checkbox to Enable Cross Validation. If the option is disabled the model mode is equivalent to No model comparison. If the option is enabled the same cross validation options are displayed as shown above.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article