LIS 4370 R Programming: Module 4 Programming Structure

This week I was given a set of data which is expressed in the table below. For the R code I used this week you can find it on my GitHub here or follow the link at the bottom of this post.

The data set below represents 10 patients that visit a hospital, the frequency of their visits during the year was recorded along with blood pressure and the assessments by two doctors and a final decision made by the head of emergency care.

Visit Frequency	Blood Pressure	First Doctor Decision	Second Doctor Decision	Final Decision
0.6	103	bad	low	low
0.3	87	bad	low	high
0.4	32	bad	high	low
0.4	42	bad	high	high
0.2	59	good	low	low
0.6	109	good	low	high
0.3	78	good	high	low
0.4	205	good	high	high
0.9	135	NA	high	high
0.2	176	bad	high	high

Table 1: Given data set

We can take this table and represent each column as a vector in R. This would look like the following.

# Variables in vector format
Freq <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)
bloodp <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)
first <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1)
second <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1)
finalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1)
# Measuring consensus among the 3 doctors by representing decisions as 1 or 0
# and the higher the sum, the greater the consensus. 
agreement <- first + second + finalDecision
#Make it a data frame
my_data <- data.frame(Freq, bloodp, first, second, finalDecision, agreement)
Code language: R (r)

For the decisions made by each doctor I represented it as a 1 or 0 for two reasons. The first is because their decisions are binary, and second this will make it easier to measure the consensus among the doctors. As you can see I am also adding a sixth vector, agreement which finds the sum of all three doctors decisions for each patient. In this case a 3 or a 0 would indicate complete agreement while a 1 or a 2 indicate some amount of disagreement. I also make this dataset a data frame to make it easier to handle the data later on.

Now that the data is prepared, lets do sum exploratory plots! To begin I will make a box plot of Frequency of Visits and Blood Pressure first with the following code:

#Plot of visit frequency
boxplot(my_data$Freq)
hist(my_data$Freq)
#Plot of blood pressures
boxplot(my_data$bloodp)
hist(my_data$bloodp)
Code language: R (r)

This produces the following visualizations:

Figure 1: A box plot and Histogram of Hospital Visit Frequency for 10 patients.

Figure 2: A box plot and Histogram of Blood Pressure of 10 patients.

It is interesting to see a distribution similar to a Paredo distribution in Figure 1 above where there are fewer patients that visit the hospital more regularly, although that is to be expected. The visit frequency also leans very heavily towards the lower end of the scale as seen in the box plot. Figure 2 shows what may be a normal distribution, more data points will make it more obvious. The box plot in figure 2 seems to suggest a similar right skew.

This data only looks at the patients themselves, how about what the doctors are saying about them? With the following R code, we can make histograms from their decisions.

#Plot of first doctor decision
hist(my_data$first)
#Plot of second doctor decision
hist(my_data$second)
#Plot of final decision
hist(my_data$finalDecision)
Code language: R (r)

This gives the following graphs:

From this it would seem the doctors lean more heavily towards saying a patient is okay. From just this one might say that they have the same decision cycle. However those are all in aggregate, let’s break this down by comparing how well they agree to the blood pressure of the patients. For the R code I could use either the built in plotting function, or the ggplot2 library. For the sake of this assignment I use both.

#Plot bloodp against agreement of the doctors
plot(my_data$bloodp, my_data$agreement, type = "h")
#ggplot style
library(ggplot2)
ggplot(my_data, aes(bloodp, agreement)) + 
  geom_col()
Code language: R (r)

This produces these graphics:

Figure 4: Bar chart of Blood Pressure versus Level of agreement of the doctors

From this it would appear that the doctors agree that patients are are in need of care when their blood pressure is either high, or is low, but they may disagree when the patient has a more healthy blood pressure. This matches the expectation that doctors are more likely to freak out when a patient has abnormal measurements or symptoms, such as very high or very low blood pressure, and are more likely to disagree when a patient is closer to healthy.

Links

GitHub: https://github.com/SimonLiles/LIS4370RProgramming/blob/main/LIS4370Mod4.R