LIS 4370 R Programming: Data Frames in R

This week I learned more about data frames in R. For this post I was given hypothetical sample election polling data. The data was originally in vectors, so I have kept it in that way. For this assignment I created an R Script which you can find on my GitHub here, or follow the link at the bottom of this post.

# Load Sample data set from assignment page
#  Data set is meant to be presidential election data from 2016, however this data
#  has been made up, it does not reflect what actually happened
Name <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Berine")

ABC_poll <- c(4, 62, 51, 21, 2, 14, 15)
CBS_poll <- c(12, 75, 43, 19, 1, 21, 19)
Code language: R (r)

Taking the data like this and creating a data frame out of it is very easy. To do it the R code looks like the following:

# Store in a dataframe
poll_results <- data.frame(Name, ABC_poll, CBS_poll)
poll_results
Code language: R (r)

And the data frame is small enough I can just print it out to the console, this is how R views the data frame.

     Name ABC_poll CBS_poll
1     Jeb        4       12
2  Donald       62       75
3     Ted       51       43
4   Marco       21       19
5   Carly        2        1
6 Hillary       14       21
7  Berine       15       19
Code language: plaintext (plaintext)

With a larger data frame I would use either the head() or str() commands to see how the data frame is built inside the console.

We can also see how each is ranked by each poll using the following R code:

# Rank by CBS results
poll_results_CBS_ranked <- poll_results[order(-poll_results$CBS_poll), ]
poll_results_CBS_ranked

# Rank by ABC results
poll_results_ABC_ranked <- poll_results[order(-poll_results$ABC_poll), ]
poll_results_ABC_ranked
Code language: R (r)

The CBS ranked poll looks like the following:

     Name ABC_poll CBS_poll
2  Donald       62       75
3     Ted       51       43
6 Hillary       14       21
4   Marco       21       19
7  Berine       15       19
1     Jeb        4       12
5   Carly        2        1Code language: plaintext (plaintext)

And the ABC ranked poll will be this:

     Name ABC_poll CBS_poll
2  Donald       62       75
3     Ted       51       43
4   Marco       21       19
7  Berine       15       19
6 Hillary       14       21
1     Jeb        4       12
5   Carly        2        1Code language: plaintext (plaintext)

This does not tell us that much and while it is easy to see ranks, the true nature of the data is harder to discern. So let us plot it!

# Load in libraries
library(ggplot2)
library(reshape2)Code language: PHP (php)

Then to plot it we reshape the data frame to make the plotting easier and use ggplot to make a bar chart with the two poll results.

# Melt the dataframe into long format to make plotting easier
poll_results_melted <- melt(poll_results, id = "Name")

# Plot using ggplot
ggplot(poll_results_melted, aes(Name, value, fill = variable)) + 
  geom_col(position = "dodge")
Code language: R (r)

The following bar chart is then produced:

Figure 1: Hypothetical poll results from 2016 election.

Now it is easy to see not just the rankings in each poll, but also the differences in the polling as well. We can see that the top and bottom of the polls are roughly the same. The difference is in the CBS poll, Berine, Hillary, and Marco are almost tied for third, while the ABC poll has Marco leading by more while Hillary and Berine are competing for fourth.

Links:

GitHub: https://github.com/SimonLiles/LIS4370RProgramming/blob/main/LIS4370Mod3.R