LIS 4273 Adv Statistics: Module 8 Hypothesis Testing and Correlation Analysis

First Question:The director of manufacturing at a cookies needs to determine whether a new machine is production a particular type of cookies according to the manufacturer’s specifications, which indicate that cookies should have a mean of 70 and standard deviation of 3.5 pounds. A sample pf 49 of cookies reveals a sample mean breaking strength of 69.1 pounds. 

A. State the null and alternative hypothesis

The null hypothesis is the mean strength of the cookies coming out of the new machine is 70 pounds. The alternative hypothesis is that the mean strength of the cookies coming out of the new machine is not 70 pounds.

B. Is there evidence that the machine is nor meeting the manufacturer’s specifications for average strength? Use a 0.05 level of significance

We have a sample mean of 69.1 pounds, the population standard deviation would be 3.5 pounds, and a sample size of 49. For a significance level of 0.05 we use an alpha of 0.05. The equation would then be the following:

z = (69.1 – 70) / (3.5 / sqrt(49))

z = -1.8

Using a z-table or the qnorm() function in R we can get the using an alpha of 0.05 for alpha, the critical values for a two tail test would have z-scores of ±1.96. Because the z-statistic falls between those values we can infer that there is evidence that the machine is meeting the manufacturer’s specifications for average strength.

C. Compute the p value and interpret its meaning.

Using pnorm() function in R with z-score of -1.8, and then multiplying the value by 2 for the two-tailed test, we get a p value of 0.072. Since the p value is greater than the alpha, we can conclude that the null hypothesis is not rejected.

D. What would be your answer in (B) if the standard deviation were specified as 1.75 pounds?

If the standard deviation were specified as 1.75 pounds, then the equation would be filled in as follows:

z = (69.1 – 70) / (1.75 / sqrt(49))

z = -3.6

The alpha remains unchanged, so the critical value of ±1.96 remains. With a z-score of -3.6, it would fall outside of the critical values, and thus provide evidence that the machine is not meeting manufacturer specifications.

E. What would be your answer in (B) if the sample mean were 69 pounds and the standard deviation is 3.5 pounds?

If the mean were 69 pounds and the standard deviation were specified as 3.5 pounds, then the equation would be filled in as follows:

z = (69 – 70) / (3.5 / sqrt(49))

z = -2

The alpha remains unchanged, so the critical value of ±1.96 remains. With a z-score of -2, it would fall outside of the critical values, and thus provide evidence that the machine is not meeting manufacturer specifications.

Second Question:
If x̅ = 85, σ = standard deviation = 8, and n=64, set up 95% confidence interval estimate of the population mean μ.  

For a 95% confidence interval, the z-score would be 1.96. The other values would then plug in to the equation as follows:

85 – 1.96( 8 / sqrt(64)) < 85 < 85 + 1.96( 8 / sqrt(64))

85 – 1.96 < 85 < 85 + 1.96

83.04 < 85 < 86.96

Thus the 95% confidence interval for this would be (83.04, 86.96). There is a 95% probability that the population mean µ falls between these two values.

Third Question using Correlation Analysis
The correlation coefficient analysis formula:

(r) =[ nΣxy – (Σx)(Σy) / Sqrt([nΣx2 – (Σx)2][nΣy2 – (Σy)2])]

r: The correlation coefficient is denoted by the letter r.

n: Number of values. If we had five people we were calculating the correlation coefficient for, the value of n would be 5.

x: This is the first data variable.

y: This is the second data variable.

Σ: The Sigma symbol (Greek) tells us to calculate the “sum of” whatever is tagged next to it.
In R 
x < – c(your date)
y<- c(your data) 
z<- c(your data) 
df<-data.frame(x,y,z) plot
cor(x,y,z)
cor(df,method=”pearson”) #As pearson correlation
cor(df, method=”spearman”) #As spearman correlation
Use corrgram( ) to plot correlograms .

Girl 1Girl 2Girl 3Boy 1Boy 2Boy 3
Goals456456
Grades49506946.154.267.7
Popular24363826.931.639.5
Time Spend on Assignment14222818.922.227.8
Total9611314195.9113141
Table 1: Given Data Set

Your assignment for Correlation Analysis
The accompanying data are: x= girls and y =boys. (goals, time spend on assignment)  

Out of the data set the totals row was selected and the girls and boys sections were compared. The R code used was as such:

x <- c(data_set$Girl1[5], data_set$Girl2[5], data_set$Girl3[5])
y <- c(data_set$Boy1[5], data_set$Boy2[5], data_set$Boy3[5])
df <- data.frame(x,y)
plot(x,y)
cor(x,y)
cor(x, y, method="pearson") #As pearson correlation
cor(x, y, method="spearman") #As spearman correlation
Code language: R (r)

a. Calculate the correlation coefficient for this data set

The correlation coefficient for the selected data is r = 0.9999. This indicates a very strong positive correlation between the two variables. From this data it can be concluded that boys spending more time on the listed activities is correlated to girls spending more time on the same activities.

b. Pearson correlation coefficient

The Pearson correlation coefficient for the selected data is r = 0.9999. This is indicative of a very strong and positive correlation between the two variables. From this data it can be concluded that boys spending more time on the listed activities is correlated to girls spending more time on the same activities.

c. Create plot of the correlation

A correlogram was created for the entire data set. Darker blue colors indicate stronger positive correlation while darker red colors indicate negative correlation. The result is as follows:

Plot 1: Correlogram of the given data set.