Dataset Prediction

Modified on Thu, 2 Feb, 2023 at 1:29 PM

Description

This function provides a way to get a prediction for an entire dataset with one single step at once. The prediction results will be added as new columns to the dataset. Predictions can also be made for multiple models with one single step.

Application

All manipulators in the section Model > Model Evaluation (like Predicted vs. Actual, Compare Performance Metrics, …) help you to assess your model on unseen test data. If you want to post-process your data in a different way to assess model quality you could run Dataset Prediction on your test dataset to get predictions for all data points and use any Transform function to work on the predictions.

In general, whenever you want to get model predictions on a larger number of data points and don’t need the direct visual feedback (e.g. like with Scalar Predictions) but want to process the predictions in further steps you can use Dataset Predictions.

You could also generate predictions on a larger number of data points and export the resulting dataset to use it in other tools downstream in your overall process.

How to use

Select the dataset to run the prediction on in the field Data.
Select all tabular models for which you want to get predictions in the field Models. If only a single model is available in the notebook when creating this step this model will already be pre-selected.
If you enable the option Add error columns the model will additionally calculate the model error and add columns accordingly.
If this option is enabled a new multi-selection field appears in which you can select the error metric which should be included. One or multiple Error metrics can be selected. A column will be added for each error metric. Available error types are:
- Absolute Error
- Squared Error
- Percentage Error

For the Add error columns option to become effective you need a column with actual values for each model output. That is, there has to be a column with the exact same name (case sensitive!) as the output column which was used to train the model. If no such column is present but the option is enabled, the step will run without error but no error columns will be added (with regard to errors just nothing happens).

Click Apply to run the step.

The resulting dataset will have additional columns:

One prediction column for each output for each model.
One column for each error metric for each output for each model.

The names of the new columns are going to be like this:

{Output Name} ({Model Name})
{Output Name} ({Model Name} {Error Type})

Examples

Two examples here to illustrate how the function works.

The first example table includes inputs and outputs.

Length	Width	Strength
3	2	10
4	1	15
…	…	…

Let’s assume we have length and width as inputs and strength is an output we want to predict. We use a Neural Network (model name NN) and a Gaussian Processes Regression (model name GP) and additionally want to have the Absolute Error calculated as well. Dataset Prediction would then produce the following table:

Length	Width	Strength	Strength (NN)	Strength (NN Absolute Error)	Strength (GP)	Strength (GP Absolute Error)
3	2	10	12	2	11	1
4	1	15	16	1	18	3
…	…	…	…	…	…	…

For the second example the table contains only the inputs. The model outputs are not included:

Length	Width
7	4
5	3
…	…

The configuration of the step is the same as for example one. But as there is no output column in the dataset no error columns will be added. The result will look like this:

Length	Width	Strength (NN)	Strength (GP)
7	4	21	23
5	3	11	11
…	…	…	…