A lot of data is simply a few of variables that can be plotted in almost any way you want to. Then we have time series data, which adds a time component, and not only is it continuous, but it also defines the order of the points. Time series plots can vary from the simple time on the x axis and a value on the y axis, to the complex such as having two different values on each axis and the time component is plotted as a path.
Today I will be focusing on a simple time series dataset, that is the Johnson and Johnson quarterly earnings from 1960 to 1980. This is a data set that can be found in the datasets library that comes with the base R package. To make the time series plots I use in this post, I wrote an R script which can be found on my GitHub here, or follow the link at the bottom of this post.
To plot this I will use ggplot2 because I like its aesthetic, however ggplot2 does not like time series objects, which the Johnson and Johnson dataset is, so I will use the ggfortify package and its wrapper functions for plotting time series data with ggplot2.
To begin I will load the necessary libraries and data into my workspace, like so.
#Load necessary libraries
library(ggplot2)
library(ggfortify)
#Load the data
jj_earning <- JohnsonJohnson
Code language: R (r)
Next I will use the autoplot()
function that comes with the ggfortify package to make a simple time series plot with ggplot2 aesthetics.
autoplot(jj_earning)
Code language: R (r)
This single line produces the following line.
This is a good plot and all, however besides being not very informative, it is boring to look at. So, to change it up a bit lets use a ribbon geom and change the color. Again a super simple line of code.
autoplot(jj_earning, ts.geom = "ribbon", fill = "darkblue")
Code language: R (r)
And the result is the following plot.
Now we have solved the boring problem a little bit, however no real insight can really be gained from the plot as it is now. I can see there are lots of hills and valleys in this data, there seems to be a pattern, but I am not quite sure, it is hard to tell just by looking at it like this.
To gain more insight from a time series plot, it is important to remember that there are different components to the series. Of course the first is obvious, there is a trend line, over the entire plot it is going up, or it will go down. However when you decompose a time series plot, you may find a repeating cycle, a seasonal variation so to speak. And then whatever is left over is random variation that cannot be explained with this simple analysis.
Fortunately, decomposing and plotting the parts of a time series is relatively simple. Instead of passing just the time series object to autoplot()
, I will pass it inside of a function called stl()
which will decompose the time series for plotting. stl()
breaks down the series into seasonal, trend, and irregular components using loess
(do not worry about what that it is, it is a type of regression fitting that is beyond the scope of this post). In all the code will be as follows.
autoplot(stl(jj_earning, s.window = "periodic"), ts.geom = "ribbon", fill = "darkblue")
Code language: JavaScript (javascript)
And it produces the following plots.
Now we can get some insight from the time series data. From this plot we can see in the seasonal component there is an annual cycle that is occurring, the first through third quarters there is a small, but steady increase, before a sudden drop in the fourth quarter. Then if you look at the remainder there is also a steady pattern that would cancel out the seasonal variation, but towards the end it becomes very erratic.
From patterns like the ones I mentioned, some inferences can be made, such as to expect slightly fewer earnings in the fourth quarter every year. I also notice the instability that got picked out in the remainder section, and while this could be a sign that there is some instability in the market, I would say a more complicated analysis would be needed. At the beginning of the remainder component it seems to have a seasonal component of its own, but then after 1970 that seasonal pattern disappears. From this I would infer that the seasonal component of this time series is not as consistent as my simple analysis would lead you to believe, it likely is going through some change as the company is still a dynamic entity. It is also worth noting that the trend components increases after 1970 and 1977, close to the changes in the pattern of the remainder component. These years seem to be important events for the company as their earnings also seem to increase dramatically.
Time series data is useful for analyzing change over time. There are a lot of components and variables that affect the variation in the data, and while some may not be explainable, it is possible to decompose the series into components that will reveal new insights. The insights I made from figure 3 are almost impossible to discern in figure 1 or 2. There are also other pieces of this visualization that help, such as the ribbon which extends out from x equals zero. This makes discerning positive versus negative effects much easier. While the cumulative graphs and their exponential trend lines may impress executives, it takes careful attention to the details of data visualization to find something new and useful.
Links:
GitHub: https://github.com/SimonLiles/LIS4317DataVisualization/blob/main/LIS4317Mod10.R