A
Consider a population consisting of the following values, which represents the number of ice cream purchases during the academic year for each of the five housemates.
8, 14, 16, 10, 11
a. Compute the mean of this population.
The mean of the given population is 11.8.
b. Select a random sample of size 2 out of the five members.
The following R code pulls out two random values out of the given vector.
# Pick 5 random samples of 2 from the given data set
houseMates <- c(8,14,16,10,11)
rSampleDF <- data.frame()
for(i in 1:5) {
Sample <- sample(houseMates, 2)
rSampleDF <- rbind(rSampleDF, Sample)
}
Code language: R (r)
For this exercise when the code was run, it created a data frame with two variables and five observations, which was populated by the random selection of five samples. The output is described in the table below.
Record 1 | Record 2 | |
---|---|---|
Sample 1 | 14 | 11 |
Sample 2 | 8 | 14 |
Sample 3 | 10 | 14 |
Sample 4 | 11 | 10 |
Sample 5 | 8 | 14 |
c. Compute the mean and standard deviation of your sample.
For the above samples, mean, standard deviation, and standard error can be calculated. I wrote a script so that my data was consistent throughout and results will be easier to reproduce. The results can be seen in the table below.
# Calculate Mean, Standard Deviation, and Standard Error for each sample
sampleMeanList <- list()
sampleSDList <- list()
sampleErrorList <- list()
for (i in 1:5) {
v <- c(rSampleDF[i,1], rSampleDF[i,2])
sampleMean <- mean(v)
sampleSD <- sd(v)
sampleError <- sampleSD / sqrt(2)
sampleMeanList <- append(sampleMeanList, sampleMean)
sampleSDList <- append(sampleSDList, sampleSD)
sampleErrorList <- append(sampleErrorList, sampleError)
}
Code language: R (r)
Mean | Standard Deviation | Standard Error | |
---|---|---|---|
Sample 1 | 12.5 | 2.121 | 1.5 |
Sample 2 | 11 | 4.243 | 3 |
Sample 3 | 12 | 2.828 | 2 |
Sample 4 | 10.5 | 0.707 | 0.5 |
Sample 5 | 11 | 4.243 | 3 |
d. Compare the Mean and Standard deviation of your sample to the entire population of this set (8, 14, 16, 10, 11).
The standard deviation of the given population can be calculated with the table method. The table method was used to simplify calculations and the sd() function in R calculates sample standard deviation, not population standard deviation.
X | x – µ | (x-µ)^2 |
---|---|---|
8 | -3.8 | 14.44 |
14 | 2.2 | 4.84 |
16 | 4.2 | 17.64 |
10 | 1.8 | 3.24 |
11 | 0.8 | 0.64 |
Sum = 40.8 |
40.8 / 5 = 8.16
sqrt(8.16) = 2.857
The standard deviation of the population is 2.857. The mean calculated earlier was 11.8.
Compared to the population parameters, the sample statistics seem to dance around the parameter values. The means are very close with only a little difference. Meanwhile, the sample standard of deviation is close to the population standard deviation, although there are much larger differences between population and sample when compared to the means. One could say that the samples will approximate the mean and the standard deviation of the population, and as the sample size grew, the closer the approximation will be to the population parameters.
B
Suppose that the sample size n = 100 and the population proportion p = 0.95.
1. Does the sample proportion p have approximately a normal distribution? Explain.
The distribution is expected to be normal if both np and nq are greater than 5. p = 0.95 and q = 0.05. Thus np = 95 and nq = 5. We can then expect the distribution to be normal with these values.
2. What is the smallest value of n for which the sampling distribution of p is approximately normal?
The smallest value of n for which the sampling distribution of p is approximately normal is 100. Any value less with p = 0.95 and q = 0.05 will push nq below 5, and both np and nq need to be greater than 5 to be able to assume a normal distribution.
I wrote a script to help with the solving of some of the problems. You can find it here on GitHub or click this link: https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod5.R