Compare Performance Metrics

Modified on Wed, 19 Jul 2023 at 09:34 AM

Description

This step enables to compare quantitatively the performance of different models by looking at some conventional metric values.


Application

In order to know whether trained models can be used in production, quantitative metrics can really help to know if a model is good enough. These metrics can also help to identify which model is the best performing one.


How to use

You need at least one trained model, and a test data set (data that wasn't used to train the model) in order to use this step properly.

  • Select the Data you want to assess the models on.
  • Select the Models that you want to assess.
  • You then have the option to sort the table by one of the metrics (Metric to sort table by). This is optional and can be left empty.
  • Give a name to the table that will be created (Name of generated table). This table can then be either exported, or used in future steps in this notebook.
  • Click Apply to calculate the metrics and generate the table with the results. Each output of each selected model will be one row in the final table.


Examples

In this example, we are comparing two models (Neural Network and Gaussian Process Regression), which are both predicting two outputs (Initial_stiffness and Ultimate_strain).

If you want to find the best model to predict the initial stiffness according to the mean square error (MSE) metric, the Neural Network has a lower MSE than the Gaussian Process Regression and is therefore better.


More on this step

Here is a detailed list of the different metrics that are displayed and how to use them:

Mean Absolute Error (MAE)

where are the true, predicted, and mean of true values respectively.

The mean absolute error quantifies the prediction error of all data points by taking the absolute values of the residual from the subtraction of the predicted value and the true value. Using the absolute value means that the “direction” of the error is not considered in this metric.

Mean Square Error (MSE)

The mean square error quantifies the prediction error of all data points by measuring the average of the squares of the error between predicted and actual values. Using the MSE rather than the MAE will result in penalising larger errors even more. 

R-Squared (R2)

The R-squared metric quantifies the degree in which a model approximates actual values. It does so by comparing the model to a baseline which would always return the average of the true values.

  • If the predictions of the model are much better than predicting the average value, then the ratio will decrease and the R2 value will increase towards 1 (perfect model).
  • If the predictions of the model are similar to predicting the average value, then the ratio value will be around 1 and the R2 value will be around 0. This mean that the model is as good as predicting the average value (which is not a very good model).
  • If the predictions of the model are worst than predicting the average value, the ratio will become greater than 1 and the R2 value will become negative. A negative R2 value means that the model is really bad. Even worst than simply predicting the average each time.

Ideally, a good model should have a positive value close to 1.

Pearson correlation coefficient

The Pearson correlation provides a measure of how well sets of data are correlated to each other.

  • A correlation coefficient values of 1 indicate that for every positive change in one variable, there is a positive increase in the other.
  • A correlation coefficient values of 0 indicate that for every positive  or negative change in one variable, there is no measurable change i.e. no correlation.
  • A correlation coefficient values of -1 indicate that for every positive change in one variable, there is a negative decrease in the other.

Here again, a good model should have a positive value close to 1.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article