The data is from a study whose research hypothesis was: “There will be a difference between boys and girls in the number of times they raise their hand in class.” For the purposes of this study, boys were coded as 2 on gender; girls were coded as 1. I wrote some R code that is included in this post and on my GitHub which can be found here or in the link at the bottom. The data given in the assignment is as follows:
Gender | Hand Up |
---|---|
1 | 9 |
2 | 11 |
2 | 1 |
1 | 8 |
2 | 2 |
2 | 6 |
1 | 3 |
2 | 4 |
1 | 8 |
2 | 3 |
1 | 10 |
2 | 6 |
This data can be analyzed using R, however to do the t test analysis for it we need to do some munging so that the table is fitted into two vectors. While I could go through by hand and separate the two groups in the data, it will be easier to create a data frame in R and then pull out the two groups and store them as vectors. A Welch Two Sample t-test is performed because the hypothesis of the researchers is looking for a difference between the two groups and parameters regarding the population are unknown. The code is as follows:
# Columns
gender <- c(1,2,2,1,2,2,1,2,1,2,1,2)
hand_up <- c(9,11,1,8,2,6,3,4,8,3,10,6)
# Creating the dataframe to hold the raw data
raw_data <- data.frame(gender,hand_up)
# Pulling out two vectors from data frame based on gender
# Girls have been coded as 1, boys have been coded as 2
girls <- raw_data$hand_up[raw_data$gender == 1]
boys <- raw_data$hand_up[raw_data$gender == 2]
#Data analysis stuff
t.test(girls, boys)
Code language: R (r)
Running the Welch Two Sample t-test in R gives the following output:
Welch Two Sample t-test
data: girls and boys
t = 1.6482, df = 9.7633, p-value = 0.1311
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.028259 6.799687
sample estimates:
mean of x mean of y
7.600000 4.714286
Code language: plaintext (plaintext)
With this output I can answer the questions given by my professor for the assignment.
1. Find your two sample means.
The sample mean for the girls is 7.6, or in other words a girl from this sample will raise her hand in class on average 7.6 times. The boys in this sample raised their hands on average 4.71 times during class.
2. Find the degrees of freedom(s).
For single variable t-tests the calculation for degrees of freedom is simple. However for the Welch Two Sample t-test in R uses the Welch–Satterthwaite equation to find degrees of freedom. You can read about it briefly here. For this t-test the degrees of freedom is 9.763.
3. Find t-test statistic score (s).
The calculated t-test statistic score is 1.6482. This shows there is a positive difference between the sample data and the null hypothesis. On its own the t-statistic really does not mean much, however it is used in the calculation of the p value.
4. Find the P value (s).
For this data set the p-value is 0.1311. What this means is that if the experiment were to be repeated again and given that there is no difference between the two means, then 13.11% of the time one would see a similar difference in the means or a more extreme difference between the means.
5. Assume you had chosen an alpha value of .05., Would this result have been statistically significant?
If I had chosen an alpha of 0.05 this result would not have been statistically significant as the p-value is greater than the alpha. To be statistically significant the p-value would need to be less than the alpha.
6. What critical value would your obtained t-test value have to exceed to be significant at the .01 level (assume a two-tailed test)?
For an alpha of 0.01 and a two-tailed test, the critical values would be -2.69 and 8.46. Thus for the data set to be statistically significant the difference between the means would have to be greater than 8.46. This can easily be found with the following line of code:
t.test(girls, boys, conf.level = 0.99)
Code language: R (r)
The output is the similar to before, and the highlighted line gives you the confidence interval for the specified confidence level.
Welch Two Sample t-test
data: girls and boys
t = 1.6482, df = 9.7633, p-value = 0.1311
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
-2.693452 8.464881
sample estimates:
mean of x mean of y
7.600000 4.714286
Code language: plaintext (plaintext)
Relevant Links:
GitHub: https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod9.R