Validation Plot

Modified on Mon, 3 Apr, 2023 at 8:45 AM

Description

While Predicted vs Actual provides a point-by-point comparison for each data point Validation plot makes it possible to compare the predicted with the actual values as line plots. The actual and predicts values are plotted versus another variable in your dataset.

Application

There are two typical applications for this function. The first is if you have any sort of series data. That might be time series, frequency series, etc. You typically would plot the output variable versus the column in your data establishing the series (e.g. time or frequency).

The second application is if you want to do a sweep of the output along some parameters. For example, you could plot the drag coefficient of a car’s rear wing along the angle of attack.

How to use

Create the step and assign a tabular dataset to it in the field Data. The function should usually be run with test data unseen by the model during training.
Select the Modelsfor which you want to create the comparison. The selected models should at least have one input and output in common.
- If the model calculates uncertainty this will be plotted as a transparent red band around the prediction as well.
In the X axisfield you can choose any of the columns which are available in the dataset.
- The step provides a lot of flexibility regarding the selection for the x-axis. Nevertheless, the most reasonable and most helpful results can be achieved if a series variable (like time) is used or if there is a clear and relatively smooth dependency between x-axis and y-axis (=output).
In the Y axis field you can choose one output common to all selected models.
If your dataset contains not just data from a single test run you can use the option Column grouping individual tests to identify each of those tests and plot separate lines for each test. The column assigned here should provide a unique identifier for each test.
Click Apply to run the step and plot the data.

Examples

Plot output versus time

This example is from Challenge 2 - Automotive Track Dynamics. The plot below shows the prediction of the force at the rear wheel for a certain driving manoeuvre. In this example the prediction of two models is compared with the actual data from measurement.

In this case, the plot shows that both models were successful in accurately predicting the behaviour of a new manoeuvre, which can increase the trust of the user in these models.

Sweep output along another parameter

This example is from Tutorial 1.2 which is about the performance of a rear wing of a car. Validation plot is used for a sweep of the Lift Coefficient along the Angle of Attack. In this case a model with uncertainty was used which is shown together with the model prediction in comparison to the actual data.

In this case, the user can see that although the predictions are not perfect, they are really accurate, and true values are always found in the range of uncertainty.

More on this step

Sort data before plotting

The data is plotted (and points connected) in the order found in the dataset. Most steps randomise the data and don’t warrant any particular order. If Validation Plot is applied to that data directly the generated line plot would look quite confusing (a lot of zigzag, line going back and forth). Therefore it is often necessary to use Sort by Columns and sort the dataset along the column used for the x-axis in ascending order.

If your dataset is not a single series of data but a set of series (i.e. the option Column grouping individual tests is used to plot the data) make sure to use this column for sorting along with the x-axis column. That way each single test gets sorted along the x-axis in ascending order which is what is needed here.

Step combination: Sort by Column + Validation Plot

Reduce size of dataset to increase clarity

If your dataset is large and is a set of multiple test series plotting all of them at once the plot becomes very confusing and very likely not helpful anymore (as for each the predicted and the actual data is plotted the total number of lines in the plot is #Series × 2). In that case you could introduce a Filter Category step before the Validation Plot. Apply the filter on the ID column which you also use for grouping individual tests. That way you can either filter down to a single ID and interactively go through all IDs in the test set. Or you can reduce to a number of IDs which still yields a reasonable plot (and of course scan through all IDs in the dataset as well).

Step combination: Sort by Column + Filter Category + Validation Plot