Description
This step enables to compare one or multiple models between them, and against different accuracy criteria.
Application
Once models are trained, it is not always clear whether they are “good enough” to be used in production. It might be easy to identify which model is the best, but that is not always enough to make it usable in production. This step enables to define accuracy criteria and assess whether they can be achieved by different models. If an accuracy criterion is not fulfilled, the step can also be used to see how far the model is from reaching it.
The step also provides an option to compare the models to a simple baseline model.
How to use
You need at least one model and one data set to use this step.
- Select one or multiple Models. Make sure that these models have at least one shared output.
- Select a dataset in the field Test Set. For the results to be meaningful, it is recommended to use a data set that wasn’t used to train the models (e.g. a test set).
- Select an Output. If no outputs are suggested, it is because models do not have any output in common.
- You can then decide whether you want to compare the model to a baseline (Compare to baseline).
- The baseline is the average value of the outputs in the training set of the first model.
- Then, you can decide whether you want to create a Criterion. A criterion will be in the following format:
- At least
[F] %
of the data is predicted with an error smaller than[E] [Unit]
. - Where
F
,E
andUnit
are choices made by the user depending on their use case. The unit of the error can be in%
or in the unit of the prediction (absolute value). - Note that for prediction values close to 0, using errors in percentage might give meaningless results.
- At least
Once the step is applied, you will get a plot looking like the one below. Here are a few points on how to read such a plot:
- Each model is represented by a curve. The steeper the initial slope and the faster it reaches 100%, the better a model is (see black arrow). In this example, model 1 is bad. Models 2 and 3 are good, and model 4 is perfect (which is not realistic).
- A criterion is represented by a cross (+). In the example below, the criterion is that at least a fraction
F
of the data should be predicted with an error smaller thanE
. - A model will fulfill a criterion if it gets around it by the left and top (see green arrow). In this case, models 3 and 4 would fulfill the criterion.
- A model will not fulfill a criterion if it gets around it by the bottom and right (see red arrow). In this case, models 1 and 2 would not fulfil the criterion.
Examples
In this example, three neural networks were trained to predict the stress value (in MPa) within a component. Two criteria were created:
- The first one, for prototyping purpose: 80% of the predictions must have an error smaller than 200MPa.
- The second, more restricting, for design purpose: 80% of the predictions must have an error smaller than 50MPa.
The step was run and the resulting plot is displayed below. The plot can be used to reach the following conclusions:
- The baseline is really bad (see grey line)
- The three neural networks easily fulfill the prototype criterion (red cross on the right)
- Neural Network doesn’t fulfill the design criterion.
- Neural Network 2 just about fulfills the design criterion.
- Neural Network 3 fulfill the design criterion and is better than Neural Network 2.
Although this step can be extremely useful to decide whether a model can be used to make predictions, it does is not suitable to assess outliers, or large errors. For example, a model might predict 90% of the data with a certain accuracy, but we are not told anything about the accuracy on the remaining 10%. In this case, this step would be run along with the Predicted vs. Actual step which will better highlight large errors.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article