For Module 3 a single data set is given and Bivariate Analysis must be performed on it. Here the data is from airport security data representing the number of pre-boarding screenings performed and the number of security violations detected from the year 1988 to 1999. The table below is a random selection of 20 cases. To do the analysis I wrote an R script which is linked below or can be found here.
Case | Pre-boarding Screenings | Security Violations Detected |
---|---|---|
1 | 287 | 271 |
2 | 243 | 261 |
3 | 237 | 230 |
4 | 227 | 225 |
5 | 247 | 236 |
6 | 264 | 252 |
7 | 247 | 243 |
8 | 247 | 247 |
9 | 251 | 238 |
10 | 254 | 274 |
11 | 277 | 256 |
12 | 303 | 305 |
13 | 285 | 273 |
14 | 254 | 234 |
15 | 280 | 261 |
16 | 264 | 265 |
17 | 261 | 241 |
18 | 292 | 292 |
19 | 248 | 228 |
20 | 253 | 252 |
From the raw data it is difficult to make a conclusion about the trend in the data, however it would seem that with more pre-boarding screenings performed, there are also more security violations detected. To clarify the relationship between the two variables, we can start by calculating the Pearson sample correlation coefficient and Spearman’s Rank Coefficient. While hand calculations can be fun, R has built in functions that make it very easy to find these values. While we are calculating these values, might as well create a scatterplot to make it easier to understand what the trend of the data. The output from the R script is below.
Pearson Sample Correlation Coefficient | 0.8375321 |
Spearman’s Rank Coefficient | 0.7575423 |
From the Pearson Sample Coefficient and Spearman’s Rank Coefficient we can see that the data has strong, direct correlation. We can conclude that in general an increase in pre-boarding screenings will have an an associated increase in security violations detected.
On its own this data is not worth much, whenever a population is surveyed for a particular trait, as sampling increases, the number with that trait will generally also increase. However for the airport this is a good start for creating a base line for security violations. Then if the airport sees a decrease in security violations without an associated and expected decrease in screenings, the airport can question if their procedures are still effective. Or maybe they did expect it because they improved their communications program to passengers regarding security rules and the decrease from baseline can be seen as a success.
See my Code on GitHub:
https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod3.R