Graph objective
Larrick and Soll (2008, Science) “The MPG illusion” demonstrate the extent and pervasiveness of a perceptual bias when people are asked to judge fuel efficiency on the basis of miles-per-gallon (MPG). The graph objective is to illustrate the source of this bias.
Miles-per-gallon (or equivalently, kilometers-per-litre) is a ratio metric that is typically used to measure fuel efficiency by car manufacturers. MPG may be useful from the perspective of the manufacturer, but from the perspective of the customer what matter the most is gallons-per-miles (GPM), because customers are more interested in knowing how much it costs to drive a car during a period, or for an average trip.
To demonstrate this point, Larrick and Soll (2008) run the following experiment. Let say that you have two cars, and you want to reduce the running costs and save the environment at the same time. To do so, you are thinking of changing one of these cars and you are given the following alternatives. Which of the two cars would you choose to change:
- Car A runs on 34 MPG and can be changed to a new car that runs on 50 MPG, i.e. an improvement of 16 MPG.
- Car B runs on 18 MPG and can be changed to a new car that runs on 27 MPG, a i.e. an improvement of 9 MPG.
Most people would choose to change Car A because they fall into the trap of perceiving the change in MPG as being linear. In fact, Car A saves half as much fuel than Car B. For a trip of, let say, 1,000 miles it holds that:
- Changing Car A saves: 1,000 miles / 34 MPG – 1,000 miles / 50 MPG = 29.4 gallons – 20 gallons = 9.4 gallons.
- Changing Car B saves: 1,000 miles / 18 MPG – 1,000 miles / 27 MPG = 55.5 gallons – 37 gallons = 18.5 gallons.
This is because the relation relation between gallons of fuel used for a given trip and MPG is curvilinear. Changing an inefficient car will save disproportionally more fuel than changing an already efficient car to an even more efficient.
We can demonstrate this effect by visualising the deterministic relations, and the gains from changing either Car A or Car B:

Data management
The graph objective can be fulfilled by just illustrating the deterministic relation just above, but to provide some context I will also show how real data follows this deterministic relation. The dataset used is sourced from Kaggle and contains information on 398 cars including MPG.
For the purposes of visualising the relation between gallons of fuel used for a given trip (let say for a 1,000 miles trip) with the MPG metric, we will also need to create a new variable, expressed as ‘gallons-per-1000-miles-trip’. This variable is effectively the inverse of miles-per-gallon times 1000.
The Stata code describing the data management steps is provided at the end of this page.
Visual implantations
The relation between gallons-per-1000-miles-trip and miles-per-gallon will be demonstrated using two approaches: (i) using the line implantation for encoding the deterministic curvilinear relation, (ii) using the point implantation for encoding the stochastic data that should like on the curved line. All observations would fall onto the predicted relation.
Retinal variables
The visual prominence of the deterministic relation line should be emphasised, so I apply the size retinal variable to apply a thicker line and the colour retinal variable to select a hue that is distinctly different to that used for encoding the point implantation.
The visual prominence of the point implantation should be reduced. The default shape for the point implantation used by Stata is inappropriate in this case (filled-in circles) as they hide the line implantation. A more appropriate shape in this case would be the vertical pipe ‘|’ with reduced line width.
Graph identification
In terms of internal identification, there is no need for a legend or direct labelling in order to differentiate between the line and points. This is because all point markers fall on the deterministic line hence it is obvious that the two are directly related.
External identification includes a title describing the graph objective, a note acknowledging the source of the data, and axes titles. There is no need for regularly spaced axes labels, as a table-lookup function is not at all part of the graph objective. To give some context, it suffices to show only the minimum and maximum of each axes.
Graph enhancement
As per above, there is no need to enabling table look-up thus we suppress the axes grids. I also suppress all axes lines and instead turn the minimum and maximum of each axes into pseudo axes lines. This focuses attention on the line implantation which is the goal of the graph objective.
I adjust the aspect ratio to 1:1.75, which I find to be more suitable for this graph.
Visual decoding/perception
Here is the proposed solution:

This graph and the graph above demonstrate the problems associated with perceiving changes in a curvilinear relation. Changing a very efficient car to an even more efficient car brings very little gains in terms of saving gallons of fuel, whereas changing a very inefficient car to a marginally more efficient car would save disproportionally more gallons.
The effect, documented by Larrick and Soll (2008), has far broader implications. Policy makers could give incentives to replacing inefficient cars with even slightly more efficient and should avoid wasting resources to giving incentives for changing all types of cars regardless their efficiency.
Such visual decoding prompts the proposition of switching from measuring fuel efficiency in terms of MPG to its inverse, GPM, because the relation of gallons of fuel consumed per 1,000 miles is linear to GPM:

Asking people to change cars on the basis of GPM would always lead to the right decision. All it takes is some sort of educational campaign to inform consumers accordingly.
Download the Stata code for reproducing this analysis: mpg_gpm.do