Intelligent Correlation

Modified on Tue, 4 Apr, 2023 at 3:30 PM

Description

The Intelligent Correlation tool gives a summary of the simple one-to-one relationships between pairs of variables in your data set. The tool takes the data points for a pair of variables and finds the best linear, power law or exponential line of best fit.

Application

If you are working with a big dataset of numerous variables, this tool can be a good starting point for selecting the input variables that are likely to influence your outputs and reducing the size of the design space. It could also be used to identify which input variables are dependent on each other and could therefore be redundant to include. The drawback of this tool is that it is only looking for one-to-one relationships so it may miss a multivariate interaction that would be important for including in a Machine Learning model.

How to use

The output of this step is a heat map with the colour intensity representing the strength of correlation for the strongest relationship found between any pair of variables.

To generate the heatmap you need to fill these fields:

X column field	Select all the column headings that you would like to appear horizontally on the heat map in this field. Typically these would be your input variables.
Y column field	Select all the column headings that you would like to appear vertically on the heat map in this field. Typically these would be your output variables. You can select any number of columns from your dataset in either of these two fields. You can also select columns to appear in both the X columns and Y columns fields simultaneously. Another option is to use the wildcard `*` in both fields to select all columns for both fields. Depending on the number of columns this can take a long time to then render the heatmap so we advise you only do this if you have less than 20 columns and if you have more than that then be selective about the column headings you choose.
Types	What types of relationships you would like to check for between the variables. Choices are: Linear, Power Law and Exponential, and by default all are selected.
Minimum strength	The minimum correlation strength you would like the Intelligent Correlation tool to plot. You can adjust this to visualise only the significant relationships in your data. (`Default = 0`)
Percentage of points required for fit	Anomalies or areas of abnormality in your data may impact the quality of the lines of best fit found between the pairs of variables. To reduce this impact and still visualise the trend in the rest of the data you can adjust the percentage of points needed to be used when calculating the lines of best fit. (`Default = 80%`)

Then click Apply to generate the heatmap.

Once the step has finished, you can also interact with the heatmap by clicking on a single square, and a scatterplot with the associated line of best fit will appear below the heatmap for the two variables that are represented by the square you clicked in. (To remove or hide the scatter plot, click in the same square again.)

Examples

Here is an example of the heatmap generated for a set of variables that describe composite materials, where all columns were selected for both the horizontal and vertical axis.

You can see highlighted in the yellow circles above, that there is a strong relationship between the Initial Stiffness and the Vc i.e. the ratio of carbon to glass fibres in the composite. The yield strength is also highly correlated to the Vc but less so than the Initial Stiffness, represented by a lighter shade of red. The pseudo-ductile-strain and the ultimate-strain are also closely related.

Clicking in one of these squares produces a 2D scatter plot below the heatmap with all the data points from the dataset for the two columns represented by that square, and the best-fit-line that generated the strength value so that you can interrogate the underlying data itself.

More on this step

If you have more than 10 variables in the X or Y column then we recommend viewing the heatmap in full screen mode using the button in the top right of the step.

Remember that Intelligent Correlations is only looking at the raw data and plotting the best fit line between each of the variable pairs. Complex physical relationships that are not linear, power law or exponential will not be identified nor will multivariate interactions.

To identify these types of connections in your data we recommend training a model and using Apply > Sensitivity Analysis to identify these more complex relationships. See the article on Sensitivity Analysis.