Frequently I stumble across work that appears to be unaware of the following result: the ratios of two normal distributions are not normally distributed. The graph objective is concerned with elucidating this important statistical result.
Since Geary (1930), we know that the ratio of two standard normals gives a Cauchy distribution with undefined moments. Tin (1965) proves that for some ratio models, when the ratio’s numerator and denominator come from a bivariate normal population then their third and fourth standardised higher-order moments increase as their correlation coefficient increases, which means than the ratio itself will be entirely non-normal.
Marsaglia (1965) also shows that the division of one scaled normal variables with another will result n a distribution that will be neither normal nor symmetric and will depend on the nature of the two ratio components. To illustrate the extend of potential non-normality, Marsaglia (1965) shows that for indices (a+x)/(b+y), with a>0, b>0 and x,y~i.i.d. standard normal random variables, when a>2.256 then the distribution of the ratio will be bimodal.
Marsaglia (2006) extends this result and explains that only “If a<2.256 and 4<b then the ratio (a+x)/(b+y) is itself approximately normally distributed.” Otherwise the result can be quite surprising. Here is Figure 1 from Marsaglia (1965) explaining the original result:
Marsaglia’s small-multiples graph is superb. It is carefully crafted and very instructive. The Graph Objective here is to elucidate the severity of the problem by focusing on some of the right hand side set of the densities shown above, and I will do so by graphing the cumulative densities.
The data management process is concerned with simulated data.
I simulate two standard normal distributions and calculate their ratio to get the Cauchy. Then, I scale up the two standard normals and take their ratios to derive the bimodal result. To get the cumulative distribution I first sort by each ratio variable and then calculate their index.
The Stata code describing the data management steps is provided at the end of this page.
The standard approach to visualising the cumulative normal uses the line visual implantation. To compare the many variables we will need several line implantations.
To contrast the many line implantations I apply the colour retinal variable. The theoretical standard normal is the reference variable and therefore it is assigned with a dull grey colour. The Cauchy is an important result by itself, so I choose to distinguish it from the rest using the navy colour. The remaining lines are assigned with orange colour and a sequential colour value scale.
I also use the size retinal variable to make the line of the reference standard normal double as thick, and the Cauchy one and half times as thick. I find this subtle adjustment necessary because the red hue attracts more attention than the grey or navy colours, and the thicker lines give the perception of equal importance.
The many lines are internally identified using a legend that describes the ratio form of each distribution.
External identification includes a grant title describing the Graph Objective and a note describing the key results and references for more context. I also provide the x-axes labels that typical used to describe the standard Normal range.
The legend is placed within the plot-region in order to save space. I add a reference line at x = 0 to assist decoding that makes it clear how the red coloured cumulative distribution functions are asymmetric.
An important graph enhancement step is to restrict the range of values of the x-axis from -3 to 3. This allows for an improved contrast of the theoretical Normal with the other distributions, and the decoding of the fat tails especially for the Cauchy.
Here is the proposed solution:
By restricting the range to -3 and 3 standard deviations, it is clear that the Cauchy has very fat tails which is a result of its infinite variance. This result is particularly evident by comparison to the theoretical standard normal whose tails dissipate very fast.
The red coloured densities are bimodal with the first mode being small and appearing just before zero but the second mode being very large and appearing after zero. All in all, the key lesson is to never divide or multiply one normally distributed variable with another normally distributed variable.
Download the Stata code for reproducing this analysis: ratio_normal.do