For the second module of LIS 4273 there were 2 sets of data given, each to have their central tendency and variation described using the R Language. The finding of the various statistics is relatively easy, the important aspect though is to understand what is being described. Below is the data that was given and then the results from the R script. The R script that I wrote is available through my GitHub linked below or just click here.
Set 1 | 10 | 2 | 3 | 2 | 4 | 2 | 5 |
Set 2 | 20 | 12 | 13 | 12 | 14 | 12 | 15 |
Set 1 | Set 2 | |
Mean | 4 | 14 |
Median | 3 | 13 |
Mode | 2 | 12 |
Range | 8 | 8 |
Interquartile Range | 2.5 | 2.5 |
Variance | 8.333333 | 8.333333 |
Standard Deviation | 2.886751 | 2.886751 |
What is very interesting about these two data sets is that their measures of variation, range, Interquartile Range, Variance, and Standard Deviation are exactly the same, almost. However the mean, median, and mode differ between the two sets. In this data I have noticed an interesting pattern, means of 4 and 14, medians of 3 and 13, etcetera. The second data set seems to add 10 to every value. It can be seen through this data that the combination of central tendency and variation are important in descriptive statistics.
GitHub Link:
https://github.com/SimonLiles/LIS4273AdvStatistics/blob/master/LIS4273Mod2.R