This week I was tasked with creating a visualization with 6 variables from a given data set out of 9 available variables. The data set is originally sourced from data.gov, representing public transit in the United States which had at least 12 variables, my professor then filtered the data down to 9 variables from which I selected 6 variables to put into my visualization using Tableau. The variables I chose were involve collisions with other vehicles and people, and also the ridership, and revenue by mile and by hour combined with a time component. The result is the above graphic.
The collision data for each year are sums while ridership and the revenue columns are medians. I chose to use the median data for ridership and revenue by mile and by hour because there are extreme outliers in the data set which push the mean extremely high, and over inflate the actual data. For the collisions with motor vehicles and people I used the sums because for that data the average and median are so low they cannot be plotted on the same scale as the other columns. The y-axis is also plotted on a logarithmic scale because the differences in some of the data resulted in the smaller columns disappearing. I also added colors in this data set to make it easier to differentiate between the different variables.
As for what can be gleaned from this graphic, right now not much, although I do notice that total collisions seems to be on the rise from 2014 to 2018 while median ridership and revenues seem to be fairly stable. From this visualization I now can see a potential path of investigation which is the increase in collisions. Maybe I could investigate where these locations are occurring, and maybe plot the revenue by mile or by hour along side the collision data.