Piper Aircraft Time Series Analysis
Note: This project was part of an academic project for a course at Iowa State University.
Purpose: The purpose of this project was to fit a time series model to a dataset of our choice to demonstrate what we learned throughout the semester. I choose to use some data about my employer, Piper Aircraft, and see if I could build a forecast model to predict future quarterly deliveries.
Piper Background: As of spring 2021, Piper Aircraft has been manufacturing general aviation aircraft since 1937. In spring of 2021, they manufactured 7 models, across 2 distinct product and value lines. The general aviation industry is highly volatile industry and is heavily influenced by the global financial market and airline travel. Piper’s quarterly deliveries are no different over the last 21 years. Piper’s quarterly deliveries since 2000 has been highly variable and the data is non-stationary. They got hit hard during the 2008 recession and its effects lingered for a few years. However, since 2010/2011 they have been steadily increasing, that is until 2020 when the world began to see the impacts of COVID-19.
Dataset: The source of this data came from GAMA, the General Aviation Manufactures Association’s, Quarterly Shipment Reports. I was able to easily download quarterly PDF reports from Q1 2000 to Q4 2020, and combine the data in Excel. This gave me 84 data points (21 observations per quarter) for 21 years.
Differencing and Exploration: Because of the non-stationarity and fluctuating sales history I mentioned before, I knew I needed to take the first difference to the data to make it stationary.
From working at Piper, I also knew that 4th quarter was usually a high quarter for shipments due to it being the end of the calendar year, tax year, and Piper’s fiscal year. As you can see by this plot, the yellow points representing 4th quarter, are usually a peak, while the blue points following those are almost always the lowest value within the year. This trend continued to affect the ACF and PACF plots throughout the analysis.
Model Identification: Using the RTseries package in R Studio, I was able to use the iden() function to fit the data. I first fit the data using a normal difference and then once again using a seasonal difference.
Due to the ACF cutting off after lag 1, and the PACF either dying down or cutting off after lag 1, it appears the model could be an ARIMA(1,1) model with a seasonal component as discussed previously.
Model Comparison: I fit the data to 3 time series models:
Model 1: An ARMA(1,1) model with a first difference, seasonal moving average difference, and a single seasonal moving average term. The SARIMA notation for this model is (1,1,1)(0,1,1)4.
Model 2: A MA(1) model with first difference, seasonal moving average difference, and a single seasonal moving average term. The SARIMA notation for this model is (0,1,1)(0,1,1)4.
Model 3: Lastly, I tried a MA(2) model with a first difference, seasonal difference, and 2 seasonal moving average terms. I was hoping this model would reduce some of the variability in model 2, but the results were unfavorable, so I dropped it from further analysis.
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
SARIMA Notation | (1,1,1)(0,1,1)4 | (0,1,1)(0,1,1)4 | (0,1,2)(0,1,2)4 |
Gamma (γ) | 1 | 1 | 1 |
Regular Difference (d) | 1 | 1 | 1 |
Seasonal Difference (D) | 1 | 1 | 1 |
φ1 (p) | 0.321 (1.460) | ||
θ1 (q) | 0.749 (4.37) | 0.480 (3.85) | 0.418 (3.73) |
θ2 (q) | 0.112 (1.01) | ||
Φ1 (P) | |||
Θ1 (Q) | 0.776 (10.01) | 0.791 (11.31) | 0.928 (7.36) |
Θ2 (Q) | -0.165 (-1.379) | ||
S | 14.82 | 15.03 | 14.66 |
Significant Spikes | 0 | 0 | 0 |
AIC | 662.74 | 662.62 | 663.24 |
Log Likelihood | 654.74 | 656.62 | 653.2435 |
LjungBox Stats (30 df) | 26.36 (0.65) | 28.23 (0.55) | 29.53 (0.49) |
Number of Parameters | 3 | 2 | 4 |
R Code:
esti(PiperShipments.tsd, model=model.pdq(p=1,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250)) #SARIMA(1,1,1)(0,1,1) - Model 1
esti(PiperShipments.tsd, model=model.pdq(p=0,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250)) #SARIMA(0,1,1)(0,1,1) - Model 2
esti(PiperShipments.tsd, model=model.pdq(p=0,d=1,q=2,P=0,D=1,Q=2), print.table = TRUE, y.range=c(0,250)) #SARIMA(0,1,2)(0,1,2) - Model 3
Residual Analysis: The results of model 1 and model 2 were very similar so I hoped taking a look at the residuals would help. Overall, the residual values were quite high, but I attribute this to the relatively small sample size and high variability in the data. I plotted the two residual plots on top of each other, and the results were still similar. Model 2 performed slightly better than model 1 as the residual average was closer to 0.
R Code:
Mod1Resid <- resid(esti(PiperShipments.tsd, model=model.pdq(p=1,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250))) #SARIMA(1,1,1)(0,1,1) - Model 1
Mod2Resid <- resid(esti(PiperShipments.tsd, model=model.pdq(p=0,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250))) #SARIMA(0,1,1)(0,1,1) - Model 2
Forecasts: Here is the forecasts for model 1 and model 2. In both you can see seasonal spikes continuing throughout the forecasts due to the seasonal moving average term.
R Code:
esti(PiperShipments.tsd, model=model.pdq(p=1,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250)) #SARIMA(1,1,1)(0,1,1) - Model 1
esti(PiperShipments.tsd, model=model.pdq(p=0,d=1,q=1,P=0,D=1,Q=1), print.table = TRUE, y.range=c(0,250)) #SARIMA(0,1,1)(0,1,1) - Model 2
It was hard to see the differences between the two forecasts so once again I plotted the two on top of each other to get a good picture of what was going on. The two forecasts were almost identical, within +/-1 aircraft shipment per quarter. Model 1 has a more narrow forecast interval, which I believe is due to the autoregressive parameter.
Results: Overall, my two models had a point forecast for Q1 2021 at 50.29 – 51.36 aircraft. Unfortunately, the actual value for Q1 2021 was 24 aircraft. This was much lower than my models predicted, this was heavily due to the impacts of COVID-19 and was similar to Q1 2020’s shipments.
Due to the similarities between the two models, I chose model 2 simply due to parsimony. Overall, model 2 has less terms and a similar result, so simple is more desirable in this situation. My final SARIMA model notation is a SARIMA(0,1,1)(0,1,1)4.