Box and Whisker

Modified on Tue, 4 Apr, 2023 at 3:21 PM

TABLE OF CONTENTS


Description

This function enables you to plot the distribution of your data as box and whisker plot. These plots are also sometimes referred to as box plots.


Application

Box and whisker plots are very useful to use when showing the distribution of data points across a selected measure. In that case the box plots display the range of values within each variable measured and you can easily compare the difference between variables.

This includes the outliers, the median, the mode, and where the majority of the data points lie in the “box”. These visualisations are helpful to compare the distribution of many variables against each other.


How to use

  • Assign the dataset upon which to resample to the field Data.
  • Choose the Column for which you want to plot the distribution.
  • Click Apply to create the box and whisker plot.

There are further options to customise the plot:

Interactive
Change the column for which the distribution is shown interactively without going into edit mode.
Group By
Show the distribution of a Column grouped by another variable. See example below for more details.
Horizontal
By default, the box and whisker plot is plotted vertically. This can be changed to horizontal.

Show data
By default, only the box and whiskers are plotted. If you enable this option, all data points are plotted as white dots additionally. The behavior of the plot changes with respect to the whiskers; see below for more details.

Plot all N data points?
Refer to this article on limiting the number of points in a plot. If you have more than 10,000 data points in your dataset and don’t enable this option, then the statistics (median, q1, q3, min, max) will only be calculated for the random subset of 10,000 data points. In general, the statistics of the smaller subset should be the same as for the original dataset but that depends on the size and distribution of the original dataset. Therefore, consider a box and whisker plot based on a smaller subset as an approximation of the original statistics.

Examples

The dataset used for this example is the from Challenge 1 - Composite Materials in the Getting Started section.

This first plot shows the distribution of the fibre length for the entire dataset. This is not telling us much.

We can use the Group By option to plot the fibre length distribution for each Carbon fibre separately. That way we can quickly see how our design space sampling differs for each Carbon fibre.

If we enable Show data the differences in the distributions and also the differences in number of data points becomes even more obvious.


More on this step

How to get the exact numbers of your box plot

If you move your mouse over a box the underlying values will be shown.

How to read a Box and Whisker plot

  • The box shows where about 50 % of the data points of the dataset fall.
  • The median splits the dataset in two subsets with the same number of data points and is shown by a line splitting the box in two halves.
  • The upper limit of the box is the upper quartile (q3). 75% of all data points are below this point. That is, 25% of the data points fall in the upper box half between median and q3.
  • The lower limit of the box is the lower quartile (q1). 25% of all data points are below this point. That is, 25% of the data points fall in the lower box half between median and q1.
  • The whiskers extend to the maximum and minimum values in the dataset.

The plot tells you in which range most of your data falls and what range is covered at all.

How “Show data” changes the plot behavior

Enabling the Show data option changes the behavior of the whiskers. See the example below which is also from the Composite Materials challenge. In this case the distributions of Ultimate strain for several carbon fibers are plotted. After enabling the Show data option the length of the whiskers change for three of the carbon fibers. The whiskers are calculated differently in both cases.

Show data disabledThe whiskers show the min and max values occurring in the dataset.
Show data enabled

The whiskers are calculated as 1.5 times the Interquartile Range (IQR). The IQR is defined by the size of the box as follows:

  • Lower Interquartile Range = Median - q1
  • Upper Interquartile Range = q3 - Median

All points lying outside of the IQR are considered outliers.

Box and whisker on categorical data

The box and whisker plot only makes sense for numerical data. You can still select a categorical Column but instead of a plot you will get a table with a value count for each distinct value within that column. This is the same result as if you use Value Counts.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article