During Sep-Nov 2017, the Turnbull government in Australia held a national postal survey asking the question: “Should the law be changed to allow same-sex couples to marry?” The result was 61.6% support with 79.5% participation, which provided the necessary evidence to the government to pursuit the legalisation of same-sex marriage at the federal level.
The Australia Bureau of Statistics (ABS) administered the postal survey and summarised the result on its website with the following graph:
ABS’s graph objective is to report the results without bias, so they have done well not to prime readers with any loaded narrative and selected neutral, yet saturated, colours to distinguish between the two categories of Yes and No.
The ABS is a reputable statistical institution that is typically quite careful in its reporting, however I argue that this graph violates several qualities of data graphing.
The graph violates the quality of accuracy as it places equal weight to each state at 12.5% each (i.e. 1/8), and disregards the disproportional populations. The decision to legalise same-sex marriage is a federal matter, and has been voted-in as an amendment to the Australia Marriage Act 1961. This means that the state-level voting result is irrelevant and we must look at the Australia-wide vote. This is correctly shown using the donut charts at the top of the ABS graph, but the bar charts lie in this respect. I understand the need to show a break down of the vote by state, but that must be done in a proportional manner. To give an example of the magnitude of the ‘lie’, the state of New South Wales posted 5,187,681 votes (including abstains, not shown in the ABS graph above), and Northern Territory posted 138,101 votes. The total votes for yes, no and abstains were 16,006,180. This means that New South Wales should be represented with 32.41% weight and Northern Territory with 0.86% weight. Thus, in the ABS graph, New South Wales is underrepresented by 2.6 times (32.41/12.5) and Northern Territory is overrepresented by 14.5 times (12.5/0.86).
Another error that violates the quality of completeness, is the omission to report the percentage of those who abstained from the vote, and the graph does not tell us anywhere that it the designer omitted to show a big chunk of the data, which is about 22.8% of the total votes. As a result, the graph lies by suggesting that the great majority of people in Northern Territory voted for Yes, whereas the truth is that the majority actually chose to abstain from voting at 41.63%. Similarly, the graph lies that more than 50% of people in New South Wales and Queensland voted for yes whereas it was only a majority vote, at 46% and 47% respectively. In addition, the ABS graph also fails to disclose the portion of illegible or unclear votes, however it can be forgiven for this omission given the somewhat immaterial percentage at 0.228% of total. Admittedly, in the same webpage, ABS also the level of participation by state but using a table, but this additional disclosure does absolve the above graph from this omission.
The graph also violates the quality of relevance as it is plagued with nonsense decorations and overlapping information. Other than the decorative postal envelope, why show a repeating donut chart at the top – one donut chart would do just fine. And why the alternate dark shadings across states? What sort of variation is shading supposed to encode?
It can be argued that the quality of efficiency also suffers, as it is hard to read and memorise the important information of voting proportions. There are too many distractions, there is low data-to-ink, and even the bars are ordered in terms of total number of votes instead of voting results.
Below I describe my attempt to resolve the identified problems.
The data is sourced from ABS’s Australian Marriage Law Postal Survey. The ABS’s datasets provided in Excel format are not well designed for additional analysis. There is always much manual work that needs to be done with ABS files in order to bring the data into a proper tabulated format.
To meet the graph objective, the important data management step is to calculate the portion of Yes, No, Abstain and Unclear to the total number of votes casted by state. These is compositional data that is characterised by bounded variation. I also calculate the state-wide number of votes for identification purposes.
I chose to suppress the display of the portion of Unclear, as it is an immaterial amount (0.228% of total) that does not add much information and rather detracts from decoding the graph objective.
A bar chart, as in the ABS graph, relies on the use of area implantations to encode differences in size. That is, the largest the portion the largest the area (the bar).
As mentioned above, the data is compositional and the bars are organised in stacked form in order to show how the portion of Yes plus the portion of No plus the portion of Abstain adds to the near total.
A major improvement of the proposed solution is to employ the size retinal variable in varying the width of bars, in order to show the disproportional contribution of votes by state according to the proportion of population. For example, the state of New South Wales should occupy 32.41% of the total area of the plot-region and the state of Northern Territory should occupy only 0.86%, and so on for the rest.
The graph objective requires the contrast of three categories: the portion of Yes, the portion of No, and the portion of Abstain. Given the qualitative categories, I choose to apply the colour retinal variable to form effective contrast using unsaturated colours. The colours are borrowed from Pantone Process Blue C (for Yes) and Pantone Green XGC (for No). For the Abstain category, I choose a light gray colour because this is effectively a missing value category that should be represented as such. That is to say, we need to see Abstain as being a ‘gray area’ that could have been swayed to be either Yes or No.
A legend internally identifies the application of the colour retinal variable on the categories of Yes, No and Abstain.
I directly identify the percentage vote for Yes and No per state that resolves any uncertainty regarding the relative size of the areas. The percentage of Abstain can be simply deduced as 100 – Yes % – No%. In addition, I directly identify the size of the voting sample for each state as part of the state label on the vertical axis
External identification includes a title and a subtitle that states the exact question asked in the postal survey. I also add a note that acknowledges the data source and the exclusion of the Unclear category comprising of 0.228% votes.
A key graph enhancement step is to eliminate the space between bars to allow for fast effortless decoding of how the entire of Australia has voted. The emphasis must is the whole Australia, and not state-by-state, because the question of legalising same-sex marriage is a federal matter.
Another important graph enhancement step is the ordering of the Abstain category in between the categories of Yes and No. This is done for two reasons. First, Abstain indicates missing votes that could have been a Yes or a No. Second, we know from experiments that the most accurate comparison is effective when magnitudes are placed on a common baseline. In this way, the length of the Yes bars can be compared using the common baseline on the left, and the No bars can be compared using the common baseline on the right. Yes and No are the key categories.
I also reduce clutter by eliminating plot-region lines, axes lines, and x-axis labels given the direct labelling of values within bars. I reduce the overall visual prominence of the titles and the note in terms of size and saturation. I identify the subtitle in italicised serif font face to give the impression of a handwritten question.
Lastly, I reduce the visual prominence of the legend, and place it to the top right hand side corner aligned across the same level as the subtitle. That is, the legend that identifies the possible survey answers directly follows the survey question. In this way, I also maximize the plotting space as the placement of the legend in the default position would contract the plot region considerably.
Here is my proposed solution:
The proposed graph meets both aspects of the Graph Objective. The graph shows how Australia voted as a whole as shown by the size of the overall coloured area across states (blue vs. gray vs. green).
At the same time, the graph shows the composition of the vote by state, by subtly dividing the horizontal bars using very thin white lines. Importantly, it demonstrates the weight of each state in influencing the federal-level decision.
Download the Stata code for reproducing this analysis: marriage_equality.do