Glossary
Box Plot
A box plot is a standardized, graphical way of summarizing the distribution of a set of data groups and visualizing them for further analysis.
What is a Box Plot graph/diagram?
A box plot is a standardized, graphical way of summarizing the distribution of multiple sets of data. It enables the display of five different values – the minimum, first quartile, median, third quartile, and maximum – in a single box shape for each group. A box plot therefore makes it easy to visualize and understand the spread of data collected and its distribution, and to compare these between groups.
It can also be used to show variability beyond the normal spread of the upper/lower quartiles through lines (called whiskers) which extend from the box, hence its alternative names of a box and whisker plot or diagram. Further outliers can be shown as data points on the graph.
The shape of the box plot shows how the data is distributed and any outliers. It is a useful way to compare different sets of data as you can draw more than one box plot per graph.
Box plots can be aligned with the boxes placed vertically (with groups on the horizontal axis) or horizontally (with groups aligned vertically). Orientating boxes horizontally is helpful when there are a lot of groups to plot, or if those group names are long, as they don’t need to be abbreviated. Orientating boxes vertically works well for other types of data, such as when the grouping variable is based on units of time.
Why are Box Plots used?
Box plots are used to provide at a glance, high-level information about a group of data, showing its symmetry, skew, variance and any outliers. Viewers can easily see where the main bulk of the data sits, and box plots are clearer to understand than a line chart when there is a great deal of variability in the dataset. Box plots also enable the comparison of multiple data groups, on the same graph and using the same scale.
However, the simplicity of a box plot means that there are limitations on the density of data that it can show. It is not possible to view the detailed shape of a distribution or spot specific peaks or troughs.
How do you create Box Plot diagrams?
Creating a box plot is standardized process:
Analyze your data
Arrange your data in numerical order, from the lowest to the highest. Then analyze it to find the five-number summary:
- The minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
- The maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
- The median (Q2 or 50th percentile): the middle value in the data set
- First quartile (Q1 or 25th percentile): also known as the lower quartile. This is the median of the lower half of the dataset.
- Third quartile (Q3 or 75th percentile): also known as the upper quartile. This is the median of the upper half of the dataset.
Create your graph
Start drawing the graph by creating a relevant, labeled and scaled axis (either vertical or horizontal). Based on the five-number summary then draw a box that extends from the first quartile to the third quartile. This indicates the range of the central 50% of the data. Add a central line to the box that shows the median in the middle of the box.
After this draw lines (or whiskers) to either side of the box to show the minimum and maximum values, excluding any outliers. Finally, plot any outliers beyond the normal ranges with dots/points.
Learn more
Public Sector
Accelerating statistical data sharing with SDMX and intuitive data portals
Access to accurate statistical information is key to the successful functioning of the global economy and for policymakers and businesses to make informed decisions around subjects that impact us all. How can institutions effectively and efficiently share their statistical data in an interoperable, scalable way to democratize access and build trust?