[REPLAY] Product Talk: Using AI to enhance the data marketplace search experience

Watch the replay
Glossary

Box Plot

A box plot is a standardized, graphical way of summarizing the distribution of a set of data groups and visualizing them for further analysis.

What is a Box Plot graph/diagram?

A box plot is a standardized, graphical way of summarizing the distribution of multiple sets of data. It enables the display of five different values – the minimum, first quartile, median, third quartile, and maximum – in a single box shape for each group. A box plot therefore makes it easy to visualize and understand the spread of data collected and its distribution, and to compare these between groups.

It can also be used to show variability beyond the normal spread of the upper/lower quartiles through lines (called whiskers) which extend from the box, hence its alternative names of a box and whisker plot or diagram. Further outliers can be shown as data points on the graph.
The shape of the box plot shows how the data is distributed and any outliers. It is a useful way to compare different sets of data as you can draw more than one box plot per graph.

Box plots can be aligned with the boxes placed vertically (with groups on the horizontal axis) or horizontally (with groups aligned vertically). Orientating boxes horizontally is helpful when there are a lot of groups to plot, or if those group names are long, as they don’t need to be abbreviated. Orientating boxes vertically works well for other types of data, such as when the grouping variable is based on units of time.

Why are Box Plots used?

Box plots are used to provide at a glance, high-level information about a group of data, showing its symmetry, skew, variance and any outliers. Viewers can easily see where the main bulk of the data sits, and box plots are clearer to understand than a line chart when there is a great deal of variability in the dataset. Box plots also enable the comparison of multiple data groups, on the same graph and using the same scale.

However, the simplicity of a box plot means that there are limitations on the density of data that it can show. It is not possible to view the detailed shape of a distribution or spot specific peaks or troughs.

How do you create Box Plot diagrams?

Creating a box plot is standardized process:

Analyze your data

Arrange your data in numerical order, from the lowest to the highest. Then analyze it to find the five-number summary:

  • The minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
  • The maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
  • The median (Q2 or 50th percentile): the middle value in the data set
  • First quartile (Q1 or 25th percentile): also known as the lower quartile. This is the median of the lower half of the dataset.
  • Third quartile (Q3 or 75th percentile): also known as the upper quartile. This is the median of the upper half of the dataset.

Create your graph

Start drawing the graph by creating a relevant, labeled and scaled axis (either vertical or horizontal). Based on the five-number summary then draw a box that extends from the first quartile to the third quartile. This indicates the range of the central 50% of the data. Add a central line to the box that shows the median in the middle of the box.

After this draw lines (or whiskers) to either side of the box to show the minimum and maximum values, excluding any outliers. Finally, plot any outliers beyond the normal ranges with dots/points.

 

Ebook - Data Portal: the essential solution to maximize impact for data leaders

Learn more
Accelerating public sector data sharing – best practice from Australia Public Sector
Accelerating public sector data sharing – best practice from Australia

Data sharing enables public sector organizations to increase accountability, boost efficiency and meet changing stakeholder needs. Our blog shares use cases from Australia to inspire cities and municipalities around the world

Using data to drive innovation across the Middle East Data Trends
Using data to drive innovation across the Middle East

The recent GITEX event in Dubai provided the perfect opportunity to understand how data sharing is changing across the Middle East. Based on our discussions, this blog highlights 5 key themes driving data use in the region.

Accelerating statistical data sharing with SDMX and intuitive data portals Public Sector
Accelerating statistical data sharing with SDMX and intuitive data portals

Access to accurate statistical information is key to the successful functioning of the global economy and for policymakers and businesses to make informed decisions around subjects that impact us all. How can institutions effectively and efficiently share their statistical data in an interoperable, scalable way to democratize access and build trust?

Start creating the best data experiences