LIS 4273 Adv Statistics: Module 3 Bivariate Analysis

For Module 3 a single data set is given and Bivariate Analysis must be performed on it. Here the data is from airport security data representing the number of pre-boarding screenings performed and the number of security violations detected from the year 1988 to 1999. The table below is a random selection of 20 cases. To do the analysis I wrote an R script which is linked below or can be found here.

Case	Pre-boarding Screenings	Security Violations Detected
1	287	271
2	243	261
3	237	230
4	227	225
5	247	236
6	264	252
7	247	243
8	247	247
9	251	238
10	254	274
11	277	256
12	303	305
13	285	273
14	254	234
15	280	261
16	264	265
17	261	241
18	292	292
19	248	228
20	253	252

Table 1: 20 random samples of pre-boarding screenings and Security Violations Detected during 1988-1999.

From the raw data it is difficult to make a conclusion about the trend in the data, however it would seem that with more pre-boarding screenings performed, there are also more security violations detected. To clarify the relationship between the two variables, we can start by calculating the Pearson sample correlation coefficient and Spearman’s Rank Coefficient. While hand calculations can be fun, R has built in functions that make it very easy to find these values. While we are calculating these values, might as well create a scatterplot to make it easier to understand what the trend of the data. The output from the R script is below.

Pearson Sample Correlation Coefficient	0.8375321
Spearman’s Rank Coefficient	0.7575423

Table 2: Coefficients calculated using R.

Figure 1: Raw scatterplot from R script.

From the Pearson Sample Coefficient and Spearman’s Rank Coefficient we can see that the data has strong, direct correlation. We can conclude that in general an increase in pre-boarding screenings will have an an associated increase in security violations detected.

On its own this data is not worth much, whenever a population is surveyed for a particular trait, as sampling increases, the number with that trait will generally also increase. However for the airport this is a good start for creating a base line for security violations. Then if the airport sees a decrease in security violations without an associated and expected decrease in screenings, the airport can question if their procedures are still effective. Or maybe they did expect it because they improved their communications program to passengers regarding security rules and the decrease from baseline can be seen as a success.

See my Code on GitHub:

https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod3.R