On time series features and forecasting by temporal aggregation

.large[.alert-bottom1[LSDG] <br> <br> <br> <br>]
.center[.title[On time series features and forecasting by temporal aggregation]]
.sticker-float[![logo](resources/carbts_t.png)]

.bottom[
Bahman Rostami-Tabar (<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#1da1f2;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg>[@Bahman_R_T](https://twitter.com/Bahman_R_T)) <br>
Website [www.bahmanrt.com](https://www.bahmanrt.com/) <br>
In collaboration with Dejan Mircetic
]

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- How does temporal aggregation approaches perform on M4 competition data?

- How does data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF ?
]

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- .remember[Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?]

- How does temporal aggregation approaches perform on M4 competition data?

- How does data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF ?
]

---

## Using time series forecasting to inform decisions

.center[<img src="figs/Framework.png" width="700px">]

.footnote[Babai, M. Zied, John E. Boylan, and Bahman Rostami-Tabar. "Demand forecasting in supply chains: a review of aggregation and hierarchical approaches." International Journal of Production Research (2021): 1-25.]

---
## Data and forecast time granularity

*  Forecasting time granularity level and its horizon are determined by decisions made in the light of forecast.

* One common assumption is that time series granularity matches forecast requirement, i.e. to produce daily forecasts, we use daily time series.

* However, the level of time series granularity .remember[does not necessarily match] the level of forecast granularity.

* The level of temporal granularity in the forecast might be lower than the existing time series granularity. For instance, while a forecast might be required at the annual level, a monthly time series is available. With advances in IT, data is often recorded at the finest temporal granularity (e.g. arrival time)

---
## Time series forecasting problem
<br><br>

* We consider a time series forecasting problem where an original time series has a higher temporal granularity (e.g. monthly) than the required forecast (e.g. annual).

* We aim to generate a forecast of the total value over a number of time periods ahead,  .remember[forecast horizon aggregation] or forecast over the leadtime period.

.footnote[1 Mohammadipour, Maryam, and John E. Boylan. "Forecast horizon aggregation in integer autoregressive moving average (INARMA) models." Omega 40.6 (2012): 703-712.]

---
class: middle

**A key question then to be answered is: **

should the original series be used to generated the forecast for the required horizon and then sum them up to obtain the forecast horizon aggregation (lead-time), i.e. .remember[Aggregate Forecast (AF)] or should we first aggregate time series to match the forecast requirement granularity and then extrapolate directly at that level, i.e. .remember[Aggregate Data (AD)].

** I will illustrate these approaches usign a simple example.**

.footnote[**There is no disaggregation to the original time granularity**]

---
class: inverse
## Terminilogy

**One time series**

- Data time granularity (e.g. daily, monthly, annual)
- Forecast time granularity (e.g. daily, monthly, annual)
- Forecast horizon (e.g. 12 months ahead)
- Forecast horizon aggregation /leadtime (e.g. 1 week, 1 quarter, 1 year)
- Temporal aggregation
    * Aggregate Forecast (or Bottom-Up)
    * Aggregate Data
      - Non-overlapping temporal aggregation (NOA)
      - Overlapping temporal aggregation (OA)
          
---
## Forecast horizon aggregation: an example

---
## Temporal aggregation: aggregate forecast

---
## Temporal aggregation: aggregate forecast

---
## Non-overlapping temporal aggregation: aggregate data

---
## Using information at multiple levels of time granularity instead of a single level - .remember[MAPA]

.footnote[Kourentzes, Nikolaos, Fotios Petropoulos, and Juan R. Trapero. "Improving forecasting by estimating time series structural components across multiple frequencies." International Journal of Forecasting 30.2 (2014): 291-302.]

---
## Using information at multiple levels of time granularity instead of a single level- .remember[temporal hierarchies]

.footnote[Athanasopoulos, George, et al. "Forecasting with temporal hierarchies." European Journal of Operational Research 262.1 (2017): 60-74.]

---
class: inverse, center, middle

**It is often recommended to aggregate data and then forecast when a time series history is recorded at a higher frequency time granularity (e.g. monthly) and forecast is required at alower level (e.e. annual).**

For an exmpel, please refer to page 153 of Profit from Your Forecasting Software, by Paul Goodwin.

**Let's examine the performance of aggregating data versus aggregating forecast approaches using M4 competition dataset**

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- .remember[How does temporal aggregation approaches perfrom on M4 competition data?]

- How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF?
]

---
## Time series data

.pull-left[
- M4 competition data time series

- 24,000 Quarterly
    - 48,000 monthly
    - 4,227 daily
    
- Time series features
    - 42 features
    - Extract features using `tsfeatures::tsfeatures()` in R
]

.pull-right[
- Forecasting methods: Exponential Smoothing State Space (ETS).
- Point forecast accuracy measure: Mean Absolute Scaled Error (MASE), Root Mean Squared Scaled Error (RMSSE), and more.
- Time series cross validation is performed.
]

.footnote[https://supplychainanalytics.shinyapps.io/Evaluation_of_ML_models/.]

---
class: middle, center

.pull-left[
### MASE
`$$\text{MASE} = \text{mean}(|q_{j}|).$$`

where
`$$q_{j} = \frac{\displaystyle e_{j}}
    {\displaystyle\frac{1}{T-m}\sum_{t=m+1}^T |y_{t}-y_{t-m}|}.$$`
]

.pull-right[
### RMSSE
`$$\text{RMSSE} = \sqrt{\text{mean}(q_{j}^2)},$$`

where

`$$q^2_{j} = \frac{\displaystyle e^2_{j}}
    {\displaystyle\frac{1}{T-m}\sum_{t=m+1}^T (y_{t}-y_{t-m})^2},$$`
]

---
## M4 Monthly time series features

---
## M4 Monthly time series features

---
### Percentage of series for which each approach was more accurate ( using MASE)

.footnote[Rostami-Tabar B., Goltsos. T, Wang Shixuan. "Forecasting for lead-time period by temporal aggregation: Whether to combine and how." Computers in Industry, Under Review]

---
## Performance of AF vs. AD (based on non-overlapping temporal aggregation)

---
## What does the literature recommend?

1. Data temporal aggregation leads to overall forecast improvement;

2.  Data temporal aggregation does not lead to forecast improvement if autocorrelation of series is highly positive, when the time series follows an ARMA(1,1) process and forecasting method is simple exponential smoothing [1];

3. There is still a lack of rules and indications on which approach should be used given time series characteristics/features [2].

<br><br>
.xsmall[[1]. Rostami‐Tabar, B., Babai, M. Z., Syntetos, A., & Ducq, Y. (2013). Demand forecasting by temporal aggregation. Naval Research Logistics (NRL), 60(6), 479-498.] <br>
.xsmall[[2]. Babai, M. Zied, John E. Boylan, and Bahman Rostami-Tabar. "Demand forecasting in supply chains: a review of aggregation and hierarchical approaches." International Journal of Production Research (2021): 1-25.]

---
class: middle
## Research question

Given the comparative performance of temporal aggregation approaches :

.huge-bahman-text[
1. How does data temporal aggregation change time series features ?

2. How time series features might be associated with the forecasting performance of AD and AF?]

---
class: inverse middle center

.huge-bahman-text[This is the first study that aims to shed lights on how the performance of temporal aggregation approaches (i.e. both AF and AD) is associated with time series features.]

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- How does temporal aggregation approaches perfrom on M4 competition data?

- .remember[How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF?]
]

---
## Experiment design

---
## How does non-overlapping TA change time series features?

---
## How does non-overlapping TA change time series features (continue)?

---
## Features relationship and AD/AF performance

---
# MCB test for all classiefiers

We also use missclassification error, F-statistics and Area under the Curve(AUC).

---
## Important features

???
feature importance or variable importance, help us understand which features are most important in driving the predictions of these two models overall, aggregated over the whole training set.
One way to compute variable importance is to permute the features (Breiman 2001a). We can permute or shuffle the values of a feature, predict from the model, and then measure how much worse the model fits the data compared to before shuffling.

---
## Partial dependence plot
### Probability of AF performing better
<img src="figure/pfinal.png" width="45%" style="display: block; margin: auto;" />

---
## Partial dependence plot (continue)
### Probability of AF performing better

---
## Summary and conclusions (continue)

- Although aggregating time series seems to be intutive, it might not always improve forecast accuracy. Our results indicate that Aggregate Forecast is a competitive approach, but neither of them dominate. They both have a merit.

- Combining aggregate data (non-overlapping and overlapping) and aggregate forecast approaches improve forecast accuracy. Combination again works here.

- Aggregate data using temporal aggregation changes the features of time series. The magnitude of the change varies for different features. In particular, we observe that with increase in the aggregation level, the strength of seasonality, the autocorrelation, coefficient of variation, linearity, curvature  and KPSS unitroot statistic decrease. However, non-linearity, mean, variance, ARCH.LM, trend , unitroot pp statistics increase. Entropy is the only measure that both increases and decreases based on its initial value.

---
## Summary and conclusions

- Random Forest model is the most accurate classifier among ML algorithm in predicting which approach provides more accurate forecast given a set of time series features as input.

- The most important features for predicting whether AF or AD should be used for a given monthly time series in M4 competition include *curvature*, *nonlinearity*, *seas_pacf*, *unitroot_up*, *mean*, *ARCHM.LM*, *Coifficient of Variation*, *stability*, *linearity* and *max_level_shift*.

- Increasing trend, ARCH.LM, hurst, autocorrelation lag 1 and unitroot_pp and seas_pacf may increases the chance of AF performing better.

- Increasing lumpiness, entropy, no-linearity, curvature, stremgth of seasonality may increase the chance of AD performing better, so the strong presence of these features may favorite AD over AF.

---
## Future works

- Effect of data temporal aggregation on prediction interval and/or forecast density;

- Extend the study to intermittent time series, daily and sub-daily time series. Given the features of such series, the findings would add value to the current knowledge state;

- Using dimension reduction to group all features representing the same type of information such as seasonality, autocorrelation, noise, etc and then build the model;

- How overlapping temporal aggregation affects series features and its connection to forecasting performance ;

---
## Future works (Continue)

-  Examine approaches such as MAPA and Temporal hierarchies utilising and their association to time series features;

- Investigate the link between time series features and forecast accuracy when forecasts are required at the original higher frequency, hence a disaggregation approach should be employed.

---

.pull-left[
### Wrok in progress
- Rostami-Tabar B., Goltsos T. Wang, S. (2022), Forecasting for lead-time period by temporal aggregation: Whether to combine and how

- Rostami-Tabar B., Mercetic D. (2022), On time series features and the perfromance of emporal aggregation
]

.pull-righ[
### Published recently
- Mircetic, D., et al. (2021), "[Forecasting hierarchical time series in supply chains: an empirical investigation](https://www.tandfonline.com/doi/full/10.1080/00207543.2021.1896817)." International Journal of Production Research, 1-20.
- Babai. M.Z., Boylan, J., Rostami-Tabar, B. (2021), "[Demand Forecasting in Supply Chains: A Review of Aggregation and Hierarchical Approaches](https://www.tandfonline.com/doi/full/10.1080/00207543.2021.2005268)", International Journal of Production Research, 1-25.
]

---
## References for temporal aggregation forecasting

- [An aggregate–disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis](https://www.tandfonline.com/doi/full/10.1057/jors.2010.32?casa_token=FLX_iKeIDXcAAAAA%3ACXYWY6jICM_1_ayaadc8GXxN05kAFo5I_qqmt7XvBjEMTHBUTWLA8kziBWQhUVj-BdNWTwJnIw). Journal of the Operational Research Society.
- [Improving forecasting via multiple temporal aggregation](https://www.sciencedirect.com/science/article/pii/S0169207013001477?casa_token=PhrGiXHJJzsAAAAA:-PU7metoOVL4G7avKR6NT9m5kzGNHPy5Lo14iEhVHqtju_L_hRUatM0M3CV3UilcBA47EuU). International Journal of Forecasting.
- [Demand forecasting by temporal aggregation](https://onlinelibrary.wiley.com/doi/full/10.1002/nav.21546?casa_token=wfP5AIk8wAQAAAAA%3A4skkyZgQCyVdftE194ZG_16CgG7CfL6-6_kb2Sqi0aiJ0aC4cWL4x2bmmRMPdupj4P4_9lihPLj3), Naval Research Logistics
- [Forecasting with temporal hierarchies](https://www.sciencedirect.com/science/article/pii/S0377221717301911?casa_token=wVe_QYpCEFoAAAAA:LT-rFP_KTK8Wbr1iQnqpGpNXjKiocfoSBuM4-0SfYTEB_6njOQcELohyPLiuPQuSgEkstCc), European Journal of Operational Research

---
class: inverse, middle

- Slides and papers: [www.bahmanrt.com](https://www.bahmanrt.com/)
- Check out also [www.f4sg.org](https://www.f4sg.org/)

<br><br>
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> Say hello: [@Bahman_R_T](https://twitter.com/Bahman_R_T)