Distribution

Modified on Fri, 22 Sep, 2023 at 12:23 PM

Description

This function allows you to either plot the distribution of a numerical column as a histogram or create a table with the count for each distinct value of a categorical column.

Application

If you want to check the distribution of your dataset visually you can use this function to create a histogram for each numerical column in your dataset.

For categorical columns the function offers the same functionality as Value Counts. Value Counts only analyses one single column at a time and provides the result as a separate dataset for further use. In contrast, Distribution can analyse all columns in your dataset in one single step but provides no output. That is the resulting tables are displayed but not provided as separate datasets.

How to use

Select the dataset to analyse in the field Data.
Select the Columns which shall be analysed. You need to select at least one column and can select as many columns as there are in the selected dataset.
Select the number of bars or bins which are used in the histogram plots (Number of bars on bar plots). This setting is applied to all plots the function produces. You can’t change this setting for single plots.

Histgram with 100 bars

Same data but plotted with only 40 bars

If you enable the option Show statistics, the Median and the Mean value of the distribution are shown in the plots as white bars. This only works for numerical columns.

Data without statistics

Plot for same data with Show statistics enabled

Click Apply to run the analysis and generate the plots.

If you selected more than one column all histogram plots and all tables are aligned horizontally. If you don’t see all results you can scroll horizontally.

Examples

The plots above are the result if a column contains numerical data. The examples above are from the composite dataset which is used in Challenge 1 - Composite Materials in the tutorials section. These plots show the distribution of the parameter ultimate strain.

In that same dataset are two categorical columns, carbon fibre and glass fibre. If you apply Distribution on these two columns the result would look like this: