Lying with Statistics

Statistics ggplot2 Visualizations R

It’s quite easy to manipulate raw data in a manner that “proves” your point. For the sake of exploring this topic further, I’ll analyze police killing data and present it in three different ways.

Javier Orraca (Scatter Podcast)
04-19-2020

Lets explore four plots and see how we can #LieWithStatistics…

Plot 1: Police killings by date, by race
General observation: Police kill more white people than black people

Police Killings, Plot 1, made with R’s ggplot2

Plot 2: Police killing boxplot showing murder rates, by race, by police department
General explanation and takeaway: The dots on each boxplot show the statistical outliers, box plot lines extend out to the “min” and “max”, and the box lines (from bottom to top of each box) represent the first quartile (25th percentile), median (50th percentile), and third quartile (75th percentile)

Police Killings, Plot 2, made with R’s ggplot2

Plot 3: Police killing boxplot, now log-transforming the murder rates to more easily identify statistical differences, by race
General explanation and takeaway: Log-transforming data points for visualization or modeling purposes is a technique by which you can smooth observed data making it more robust (or resistant) to outliers. I effectively re-wrote the murder rates to show exponential relativity.
Important caveat: Are Native Americans more likely to die by police than other races? Sure looks like it… but see Plot 4 for more thoughts

Police Killings, Plot 3, made with R’s ggplot2

Plot 4: Police killing boxplot, now log-transforming the murder rates using a log base 10 (easier interpretability) and “fixing” the Native American data points causing a misleading assumption in Plot 3, i.e., Native American death rates appeared much higher than others in Plot 3 given the fact that log(0) = 1.
General takeaway: There were such few Native American data points that log-transforming all of the zeroes was unintentionally bastardizing the analysis. It would appear black people are almost an order of magnitude more likely to be killed by police than white people.

Police Killings, Plot 4, made with R’s ggplot2

I do not seek to answer questions of “why” systemic injustice exists in the US, but I wanted to analyze police killing data and share these dialectical investigations.

Source:
* Samuel Sinyangwe & the Mapping Police Violence team

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Orraca (2020, April 19). Javier Orraca: Lying with Statistics. Retrieved from https://www.javierorraca.com/posts/2020-04-19-Police-Killings/

BibTeX citation

@misc{orraca2020lying,
  author = {Orraca, Javier},
  title = {Javier Orraca: Lying with Statistics},
  url = {https://www.javierorraca.com/posts/2020-04-19-Police-Killings/},
  year = {2020}
}