TransWikia.com

Using Regression Trees for Univariate Time Series Data

Cross Validated Asked on December 15, 2021

I have a monthly time series (105 observations) including trend and seasonality and want to forecast the numeric values.

I initially tested with the Box-Jenkins approach and other univariate models like Facebook Prophet.

Now I want to extend to multivariate models and therefore I implemented a Regression Tree (scikit learn – Decision Tree Regression).
I split up my dataset in train/test data (89:16 observations). The most recent data of the time series is the test part.

To my surprise this Regression Tree worked very well on my test data, without extending it with other features. My dataframe consisted only of the time series and the index number.

My questions: Why does this Regression Tree works so well only with the time series as the data input? Is there an autoregressive component included like in the Box-Jenkins models? I thought this model class requires other features as input. Or is the index alone a valuable input for the Regression Tree?

2 Answers

There's a good explanation of how to adapt regression algorithms to forecasting problems here, including how to generate forecasts from the fitted regression models.

If you're interested, we're developing a toolbox that extends scikit-learn for exactly these use cases. So with sktime, you could write:

import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.compose import ReducedRegressionForecaster
from sklearn.tree import DecisionTreeRegressor
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss

y = load_airline()  # load 1-dimensional time series
y_train, y_test = temporal_train_test_split(y)  
fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
regressor = DecisionTreeRegressor()

# the ReducedRegressionForecaster takes care of adapting the 
# regressor to the forecasting problem
forecaster = ReducedRegressionForecaster(regressor, window_length=10)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
print(smape_loss(y_test, y_pred))
>>> 0.38916286717988746

Answered by mloning on December 15, 2021

I solved my issue: If you train a CART tree only with the time series data (univariate) and validate the model with the time series test part (also univariate), you will get a pretty low error rate.

The problem is, that you need independent variables (and not the time series target value itself) for further forecasting as an input. Otherwise you try to forecast data on data you don't have. That doesn't work.

Answered by Steve on December 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP