LIS 4317 Data Visualization: Spotting Differences

For this week’s visualization I used the default mtcars dataset that comes with R. This dataset was extracted from the 1974 Motor Trend US magazine. The dataset has 11 variables that include things such as horsepower, fuel efficiency, weight, etc. In total, there are 32 cars listed in this data set.

To create the visualization in this post I wrote an R script which can be found on my GitHub here, or follow the link at the bottom of this page.

I used the horsepower, weight, quarter-mile time, cylinder count, and fuel efficiency metrics in the above data visualization. The x-axis has horsepower divided by weight to measure the total power of a car. This is being compared to the quarter-mile time on the y-axis which is measured in seconds. The color of each point is set to the number of cylinders while size is set to the fuel efficiency measured in miles per gallon. To create this visualization in R, I used the ggplot2 library in the following lines of code:

ggplot(myCars, aes(x = hp/wt, y = qsec, color = as.factor(cyl), size = mpg)) +
  geom_point()
Code language: R (r)

This visualization does fit into spotting differences, although it could be more effective with more adjustments to the code to make the differences in the data more obvious. In this visualization the size of the points is used to spot differences in a cars performance and its fuel efficiency. For example from this visualization it appears that in general, lower performance cars have better fuel efficiency, while high performance cars have low fuel efficiency. By using different colors to represent the number of cylinders, we can also see that fewer cylinders have better fuel efficiency, and also have less power and speed than cars with more cylinders.

Spotting differences and deviation analysis is best used when there is at least one categorical variable type in the data. Without that variable there would only be 1 group. This variable could be calculated though, for example with time series data, the data set could be split in half and then there is a before and after event group which can then be compared. The data set I used already had categorical variables like this, and so I tried to include them and other continuous numerical data so that they had values to be compared.

Links:

GitHub: https://github.com/SimonLiles/LIS4317DataVisualization/blob/main/LIS4317Mod6.R