a little math on non-zero baselines
I had a friendly exchange with a blog reader over the past week related to my recent post highlighting some Pew Research makeovers. In this post, I made some comments regarding the use of a non-zero baseline: specifically, that it's not ok to have a non-zero baseline with a bar chart (see related blog post), but that you can get away with it in a line graph. The question was regarding how shifting to a non-zero baseline impacts the slopes of the lines in a line graph.
That's a very good question.
One I hadn't given any thought to before.
But now that I was thinking about it, I worried I'd been recommending something incorrect.
This must have really been weighing on me, because the night after I received the initial email asking about the impact on slopes, I had a dream where I was doing the math to show that it's actually ok. Next challenge: to see whether I could replicate my proof in reality. The short answer is yes.
Rescaling from a y-axis that begins at zero to one that does not begin at zero actually doesn't impact the slope of the lines. To demonstrate this, I sketched out an example, leveraging some lessons learned back in 7th grade algebra:
On the left hand side, I've plotted a line that connects the data points 6, 7, and 19 (to which I've given x-coordinates of 1, 2, and 3, respectively). In this initial version, the y-axis ranges from a minimum of zero to a max of twenty. The calculations for the slopes of the two lines that connect these points is shown below the graph.
On the right hand side, I've plotted the same points on a scale from 5 to 20, reducing the y-axis coordinates by 5 each to reflect this change in scale. Below the graph, we see the math for the slopes of the two lines connecting these points. The slope of each line is the same as it was initially. In other words, using a non-zero baseline does not impact the slope of the lines in a line graph.
Just to be sure I didn't inadvertently pick lucky numbers in this example, I did a second example with points (5, 12), (10, 17), and (20, 42) on a full y-axis scale from 0 to 50, and then one from 5 to 50 (again, reducing the y-coordinates appropriately to reflect this rescaling). I found the same thing: the slopes of the lines remain the same between the graph with the zero baseline and the one that's been rescaled. When I thought about this some more later, it seemed obvious - of course the slope of the lines doesn't change, because I'm not changing the points relative to each other, rather I'm changing their location relative to the x-axis.
But the conversation didn't end there. When I shared this with blog reader, Roberto, he responded with a couple of graphs to help illustrate his points. The first shows the original line (blue) on the primary y-axis (ranging from 0 to 20) and the line rescaled onto a secondary axis (red; with axis ranging from 5 to 20).
The next graph shows the same initial line on the primary axis (blue) and the line rescaled onto a secondary axis (red) that ranges from 5 to 105.
It's true that the absolute perception of steepness changes with the changing axis range. You see this when comparing either line that's plotted on the secondary axis to the original. But I'm not convinced that the relative slope between the two segments of the line are impacted, rather these appear to move together as the axis range changes.
To make sure I'm not promoting anything inappropriate, I consulted a couple other sources/experts.
Alberto Cairo said when possible and to avoid confusion, retain the zero baseline. He suggested when this isn't feasible, you can create two line graphs rather than one, where the one with the zero baseline can be a small inset without the scale (just the baseline) in one corner of the larger graph, where you've zoomed in. This is an interesting solution, and one I plan to try out when the next opportunity presents itself.
I also consulted Stephen Few's Show Me the Numbers, where his description of zero-based scales reads as follows:
When you set the bottom of your quantitative scale to a value greater than zero, differences in values will be exaggerated visually in the graph. You should generally avoid starting your graph with a value greater than zero, but when you need to provide a close look at small differences between large values, it is appropriate to do so. Make sure you alert your readers that the graph does not give an accurate visual representation of the values so that your readers can adjust their interpretation of the data accordingly.
He follows this up with an example zoomed in line graph with the following warning: "Attention: The dollar scale along the vertical axis is narrow to reveal the subtle, yet steady rise in sales since July."
So the bottom line is: you can have a non-zero baseline in line graphs (which can be useful when the numbers you want to show are some distance away from zero), but I (and other experts) caution the use of care when doing so. You want to take context into account and make sure you aren't zooming in a way that visually overemphasizes minor differences. Also, make it clear to your reader that you aren't utilizing the full scale. Agree/disagree? Have other ideas for addressing this challenge? Leave a comment with your thoughts.
Big thanks to Roberto for his thought-provoking comments (please feel free to jump in if I've mischaracterized anything; also, for those who might be interested, Roberto's Excel gallery can be found here). Thanks also to Alberto for taking the time to read my draft post and lend his thoughts.