Issues with Bland-Altman plots when the outcome is ordinal
Published:
Someone recently proposed using Bland-Altman plots to look at agreement between two ordinal scales taking on integer values from 0 to 5. This intuitively felt wrong, because I recalled that Bland-Altman limits of agreement depended on a normality assumption and also the range of possible differences is discrete and finite. I decided to think about this a little bit more.
Suppose the following:
- Each individual, indexed by $i$, has two ordinal measurements taken, called PMAC1$_i$ and PMAC2$_i$, where $i=1,\dots,N$.
- PMAC1 and PMAC2 are ordinal variables taking on values 0, 1, …, 5.
The Bland-Altman plot (Bland, Altman, 1999, Statistical Methods in Medical Research) is constructed by the following:
- For each individual, calculate the difference between the two measurements, i.e., $d_i$ = (PMAC2$_i$ - PMAC1$_i$), and plot it against the average of the two measurements, i.e., 0.5*(PMAC1$_i$ + PMAC2$_i$). Denote the mean of the differences ($d_i$) by $\bar{d}$ and the standard deviation of the differences by s$_d$.
- Plot the line $y=\bar{d}$.
- Plot the 95% limits of agreement, which are defined by $y = \bar{d}\pm 1.96*s_{d}$.
- If the differences are normally distributed, we would expect 95% of the differences to lie between the 95% limits of agreement.
- “In general, a large $s_d$ and hence widely spaced limits of agreement is a much more serious problem.”
I wanted to see if I could come up with a data scenario where the Bland-Altman plot would not be ideal in the case of PMAC1 and PMAC2, which are ordinal variables ranging from 0 to 5.
But first, let’s look at an ideal case.
Example where a Bland-Altman plot is appropriate
Using $N=118$, I generated PMAC1 and PMAC2 independently from two normal distributions with means 1 and 4, respectively, and SD=3.5. This assumes PMAC1 and PMAC2 are continuous (not ordinal). I created a Bland-Altman plot below, with lines for the mean difference $\bar{d}$ and the limits of agreement in gray.

You can see that the histogram of the differences (on the right side of the figure) is symmetric. The 95% limits of agreement actually capture exactly 95% of the differences. This plot provides a sense of how close (or alternatively, different) the two measures are. There is no statistical testing or determination of whether or not the limits of agreement are clinically meaningful.
Case of ordinal measure where a Bland-Altman plot is probably not ideal
Using $N=118$, I cooked up example data where PMAC1 and PMAC2 are ordinal variables taking on integer values from 0 to 5. I created a Bland-Altman plot below, with lines for the mean difference $\bar{d}$ and the limits of agreement in gray.

Things to note:
- When the measure is ordinal with range from 0 to 5, the average between two scores (on the horizontal axis) can only take on values 0, 0.5, 1, 1.5, …, 5. (Note: We can slightly jiggle the average of the two scores by a little random noise to separate the points - see below.) Similarly, the range of differences can only take on integer values between -5 and 5.
- The distribution of the differences doesn’t look symmetric (i.e., not normally distributed).
- The 95% limits of agreement don’t necessarily provide a good sense of the range of differences. In this example, the 95% limits of agreement capture 89% of differences.
- Bland and Altman suggest that log-transforming can help when the distribution of differences is not symmetric. It doesn’t help in this case. (See figure below.)


Regardless, I think it’s valuable that the Bland-Altman plot can provide a visual illustration of the pairwise differences observed in the data even if the 95% limits of agreement do not contain roughly 95% of the differences.
