Explain Predictions

Modified on Wed, 5 Feb at 11:32 AM

Description

Explain predictions analysis helps you to increase your understanding of the relationships between inputs and outputs of your models.

Application

This can help you finding errors in a model if unexpected relationships are met; you can use these findings to improve your model in an iterative process. It might also help you to find patterns you weren’t aware of so far and improve your understanding of the data behind the model. This step will NOT work if your model has missing data in it's training set, please ensure that you utilise the 'Remove Missing' step prior to model training, if your model has missing data.

How to use

When you first click on Explain Predictions you have to configure the manipulator:

Choose a Model on which the analysis should be performed. - This step will not work with stacked or chained model
Next you have to choose the output for which you want to run the analysis in the field Output to explain.
And finally, you have to choose if you want to run this analysis on All the training data points or A single newly-defined design.

The step produces a bar plot which shows the importance of each input on the output which was selected above. This plot shows the importance of the input parameters of the selected design towards affecting the output parameter value. Green bars indicate that the input value is causing an increase of the output while red bars indicate that the input value is causing a decrease. Please note that Impact is measured as an arbitrary value for comparative analysis.

Global explain prediction

The analysis is performed on all training data points with the option All the training data points. That is, the Explain Predictions plot show the importance of the inputs on the output as an average value for the entire input design space.

For example, below, the size parameters have minor impact on the recyclability of the product, while the material and manufacturing process have the biggest impact.

Local explain prediction

For the option A single newly-defined design the analysis is only done for a single design (a specific set of input parameter). A set of sliders (numerical inputs) and dropdown menus (categorical inputs) appears below the bar plot. With these controls you define the design for which the local analysis should be done. Obviously, this option will produce results much faster.

In the example below, the size param 2 has a much greater impact on the product recyclability, which was not captured by the global analysis. That is, for this specific design the input is actually making a significant contribution to the output while that is not the case globally.

For another design though, results can be completely different, as you see in the next example:

This also shows why a local analysis might be important. The global analysis gives you a clue which inputs are overall (un-)important. Based on that you can make decisions on which inputs should or shouldn’t be considered for your model. But inputs which are globally unimportant could have a significant effect in some regions of your design space. Therefore, you should also check your design space locally in regions which are of high importance for your engineering problem.

More on this step

The Explain Predictions manipulator uses the SHAP algorithm. This algorithm is significantly different from the algorithms used for sensitivity analysis. The sensitivity analysis chooses random samples in the design space, evaluates them with the trained models and breaks down how much of the changes in the output(s) can be explained by the different inputs. Phrased different, how much do changes of the input variables cause changes in the outputs?

The SHAP algorithm takes a completely different approach. Instead of asking “How much do changes of my input cause changes of my output?”, the question here is “How much does my model prediction change if I use or don’t use this feature as input?” That is, in a way, for this analysis different models with all sorts of input combinations are trained and the overall impact of each input is figured out from these combinations. Below is an example to illustrate that.

Imagine a database in which you have information about age, gender, job and salary of people. You use the three first ones as input to make a prediction on the salary. To figure out the importance of the three inputs on the output “salary” the SHAP algorithm starts from the average value of the output in the training dataset (which is 50k $ in our example). Starting from that the SHAP algorithm evaluates how the average model prediction changes if certain variables are used as input (or aren’t used). In the image below, all connections are highlighted in red which are relevant to figure out the importance of the input feature “age”. The arrows always start at models which don’t use “age” as input and then compare how the model prediction changes if “age” is added as input. The overall importance would be a weighted average of all highlighted connections in the graph below.

For this example, you see that “age” is obviously a significant input and on average is decreasing the predicted salary. In the bar plots the input “age” would appear with a red bar.

The method used in this step is non-deterministic, which means that you might obtain different results each time you run it. Check this article to know more about this.