LIS 4273 Adv Statistics: Module 3 Bivariate Analysis

For Module 3 a single data set is given and Bivariate Analysis must be performed on it. Here the data is from airport security data representing the number of pre-boarding screenings performed and the number of security violations detected from the year 1988 to 1999. The table below is a random selection of 20 cases. To do the analysis I wrote an R script which is linked below or can be found here.

CasePre-boarding ScreeningsSecurity Violations Detected
1287271
2243261
3237230
4227225
5247236
6264252
7247243
8247247
9251238
10254274
11277256
12303305
13285273
14254234
15280261
16264265
17261241
18292292
19248228
20253252
Table 1: 20 random samples of pre-boarding screenings and Security Violations Detected during 1988-1999.

From the raw data it is difficult to make a conclusion about the trend in the data, however it would seem that with more pre-boarding screenings performed, there are also more security violations detected. To clarify the relationship between the two variables, we can start by calculating the Pearson sample correlation coefficient and Spearman’s Rank Coefficient. While hand calculations can be fun, R has built in functions that make it very easy to find these values. While we are calculating these values, might as well create a scatterplot to make it easier to understand what the trend of the data. The output from the R script is below.

Pearson Sample Correlation Coefficient0.8375321
Spearman’s Rank Coefficient0.7575423
Table 2: Coefficients calculated using R.
Figure 1: Raw scatterplot from R script.

From the Pearson Sample Coefficient and Spearman’s Rank Coefficient we can see that the data has strong, direct correlation. We can conclude that in general an increase in pre-boarding screenings will have an an associated increase in security violations detected.

On its own this data is not worth much, whenever a population is surveyed for a particular trait, as sampling increases, the number with that trait will generally also increase. However for the airport this is a good start for creating a base line for security violations. Then if the airport sees a decrease in security violations without an associated and expected decrease in screenings, the airport can question if their procedures are still effective. Or maybe they did expect it because they improved their communications program to passengers regarding security rules and the decrease from baseline can be seen as a success.

See my Code on GitHub:

https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod3.R