TABLE OF CONTENTS
Description
This function enables you to plot the distribution of your data as box and whisker plot. These plots are also sometimes referred to as box plots.
Application
Box and whisker plots are very useful to use when showing the distribution of data points across a selected measure. In that case the box plots display the range of values within each variable measured and you can easily compare the difference between variables.
This includes the outliers, the median, the mode, and where the majority of the data points lie in the “box”. These visualisations are helpful to compare the distribution of many variables against each other.
How to use
- Assign the dataset upon which to resample to the field Data.
- Choose the Column for which you want to plot the distribution.
- Click Apply to create the box and whisker plot.
There are further options to customise the plot:
Change the column for which the distribution is shown interactively without going into edit mode. | |
Group By | Show the distribution of a Column grouped by another variable. See example below for more details. |
Horizontal | By default, the box and whisker plot is plotted vertically. This can be changed to horizontal. |
Show data | By default, only the box and whiskers are plotted. If you enable this option, all data points are plotted as white dots additionally. The behavior of the plot changes with respect to the whiskers; see below for more details. |
Plot all N data points? | Refer to this article on limiting the number of points in a plot. If you have more than 10,000 data points in your dataset and don’t enable this option, then the statistics (median, q1, q3, min, max) will only be calculated for the random subset of 10,000 data points. In general, the statistics of the smaller subset should be the same as for the original dataset but that depends on the size and distribution of the original dataset. Therefore, consider a box and whisker plot based on a smaller subset as an approximation of the original statistics. |
Examples
The dataset used for this example is the from Challenge 1 - Composite Materials in the Getting Started section.
This first plot shows the distribution of the fibre length for the entire dataset. This is not telling us much.
We can use the Group By option to plot the fibre length distribution for each Carbon fibre separately. That way we can quickly see how our design space sampling differs for each Carbon fibre.
If we enable Show data the differences in the distributions and also the differences in number of data points becomes even more obvious.
How to get the exact numbers of your box plot
If you move your mouse over a box the underlying values will be shown.
How to read a Box and Whisker plot
- The box shows where about 50 % of the data points of the dataset fall.
- The median splits the dataset in two subsets with the same number of data points and is shown by a line splitting the box in two halves.
- The upper limit of the box is the upper quartile (q3). 75% of all data points are below this point. That is, 25% of the data points fall in the upper box half between median and q3.
- The lower limit of the box is the lower quartile (q1). 25% of all data points are below this point. That is, 25% of the data points fall in the lower box half between median and q1.
- The whiskers extend to the maximum and minimum values in the dataset.
The plot tells you in which range most of your data falls and what range is covered at all.
How “Show data” changes the plot behavior
Enabling the Show data option changes the behavior of the whiskers. See the example below which is also from the Composite Materials challenge. In this case the distributions of Ultimate strain for several carbon fibers are plotted. After enabling the Show data option the length of the whiskers change for three of the carbon fibers. The whiskers are calculated differently in both cases.
Show data disabled | The whiskers show the min and max values occurring in the dataset. |
Show data enabled | The whiskers are calculated as 1.5 times the Interquartile Range (IQR). The IQR is defined by the size of the box as follows:
All points lying outside of the IQR are considered outliers. |
Box and whisker on categorical data
The box and whisker plot only makes sense for numerical data. You can still select a categorical Column but instead of a plot you will get a table with a value count for each distinct value within that column. This is the same result as if you use Value Counts.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article