Box Plots

This page covers a matplotlib box plots tutorial.

A box plot, also known as a box-and-whisker plot or box chart or box graph, is a graphical representation of the distribution of a dataset. It displays key statistical information including the median, quartiles, and potential outliers.

The following is how a typical box plot is constructed.

Median (Q2) – a line or symbol inside the box represents the median value of the dataset.

Quartiles (Q1 and Q3) – the box itself represents the interquartile range (IQR). This is the range between the first quartile (Q1) and the third quartile (Q3). The lower edge of the box represents Q1, and the upper edge represents Q3.

Whiskers – lines (whiskers) extend from the edges of the box to represent the range of the data. These lines usually extend to a certain multiple of the IQR away from the quartiles. Sometimes, they might represent the minimum and maximum values within a certain range.

Outliers – individual data points that fall significantly beyond the whiskers are often considered outliers and are plotted individually.

Why use box plots?

Box plots are useful for comparing distributions between different groups of data or visualising the spread and skewness of a single dataset. They provide a concise summary of the central tendency and variability of the data. The key reasons to use box plots: 1. visualising distributions, 2. identifying outliers, 3. comparing distributions, 4. summarising data, and 5. handling skewed data.

Below is a basic representation of a box graph.

Implementation

Creating box charts with Matplotlib is easy. The key implementation tool is boxplot().

import matplotlib.pyplot as plt

# Sample Data

data = [2, 7, 1, 2, 0, 10, 2, 7, 5, 8]

# Box Plot

plt.boxplot(data)

# Title and Labels

plt.xlabel(‘Groups’)

plt.ylabel(‘Values’)

plt.title(‘Box Plot Example’)

# Show the Plot

plt.show()

We can further add more features to make the box graph more readable. Let’s include a grid and group data labels. We do these through the methods grid() and labels. Below is the code example.

import matplotlib.pyplot as plt

# Sample Data

data = [2, 7, 1, 2, 0, 10, 2, 7, 5, 8]

# Box Plot

plt.boxplot(data, labels=[‘Group 1’])

# Title and Labels

plt.xlabel(‘Groups’)

plt.ylabel(‘Values’)

plt.title(‘Box Plot Example’)

plt.grid(True)

# Show the Plot

plt.show()

Let’s further customise the box chart. There are many ways of changing the structure of the box plot. In this scenario, we change the style of the box line as well as the median line. This happens through the methods boxprops and medianprops, respectively. Note that we need to include all changes within a Python dictionary.

import matplotlib.pyplot as plt

# Sample Data

data = [2, 7, 1, 2, 0, 10, 2, 7, 5, 8]

# Box Plot

plt.boxplot(data, labels=[‘Group 1’], boxprops=dict(c=”b”, ls=”–“), medianprops=dict(c=”r”, ls=”-.”))

# Title and Labels

plt.xlabel(‘Groups’)

plt.ylabel(‘Values’)

plt.title(‘Box Plot Example’)

plt.grid(True)

# Show the Plot

plt.show()

This is an original matplotlib box plots tutorial. Created by aicorr.com.

Next: Heatmaps