LIS 4317 Data Visualization: Making a Tufte Style Visualization

LIS 4317 Data Visualization: Making a Tufte Style Visualization
Figure 1: Plot of 1974 Motor Trend Car Horsepower against MPG.

Edward Tufte set out principles of design for the visualization of data. One of the big take aways from his teachings is to maximize the amount of data in a visualization, and minimize the amount of ink that is used. Often extraneous markings on a visual will just obscure the data and can distort the meaning of the data. To explore some of these ideas I made the Marginal Histogram Scatterplot above in Figure 1. To make the plot I wrote an R Script which you can find on my GitHub here, or follow the link at the bottom of this post.

To make the plot I used the ggplot2 package along with ggExtra and ggthemes. The code to make the plot is as follows.

#Load data
my_cars <- mtcars

#Load neccessary packages
library(ggplot2)
library(ggExtra)
library(ggthemes)

#Make plot
#Marginal Histogram Scatterplot
p <- ggplot(my_cars, aes(hp, mpg)) + 
  geom_point(position = "jitter") + 
  theme_tufte(ticks = FALSE)
ggMarginal(p, type = "density")
Code language: R (r)

The plot is simple, so the code is also very simple. Here I create a simple scatter plot and set the theme to tufte which removes a lot of the unnecessary ink. The next step is then to use the ggMarginal() function to create the distribution plots in the margins. Typically this is done with dot plots, however those are not fully supported in ggplot2, so I am using a simple density plot in the margins. In this case the density plot is better at describing the general trends in the data than a histogram because there is a lot of segmentation in the data.

For the sake of comparison, here is the same plot, except with a histogram in the margins.

Figure 2: Same as Figure 1, except with a Histogram.

Personally I like the simple nature that results from Tufte’s recommendations. The simple plots are very aesthetically pleasing, have a more formal feel to them, and maintain the quality of communicating the data.

Links:

GitHub: https://github.com/SimonLiles/LIS4317DataVisualization/blob/main/LIS4317Mod11.R