Best common metric for comparing classic time series forecasting methods (ARIMA/Prophet) with ML approach?

Question

I am new to time series forecasting and looking to compare the performance of ARIMA/Prophet with an XGBoost model in predicting future stock market values based on historical stock market data and social media sentiment scores.
I am more familiar with machine learning, so would usually use an evaluation metric like $R^2$ to assess model performance for this sort of problem.
Are there any common evaluation metrics which forecasting methods like ARIMA/Prophet use to assess their accuracy, so I can do a like-for-like comparison with the XGBoost's prediction accuracy?

Neelesh Shukla · Answer

Please refer to the paper by Rob J Hyndman, who is a pioneer in time series and has contributed forecast package in R: Another look at measures of forecast accuracy
As per the conclusion in the paper:

We propose that scaled errors become the standard measure for forecast accuracy, where the forecast error is scaled by the in-sample mean absolute error obtained using the naıve forecasting method. This is widely applicable and is always defined and finite except in irrelevant cases where all historical data are equal. This new measure is also easily interpretable: values of MASE greater than one indicate the forecasts are worse, on average than in-sample one-step forecasts from the naive method.

Of course, there will be situations where some of the existing measures may still be preferred. For example, if all series are on the same scale, then the MAE may be preferred because it is simpler to explain. If all data are positive and much greater than zero, the MAPE may still be preferred for reasons of simplicity. However, in situations where there are very different scales including data which are close to zero or negative, we suggest the MASE is the best available measure of forecast accuracy.

Donald S · Answer

MAPE and MASE are common metrics to use for time series, which you may not be familiar with.
MAPE - Mean Absolute Percent Error:

MASE - Mean Absolute Scaled Error:

Reference:
https://blogs.oracle.com/datascience/7-ways-time-series-forecasting-differs-from-machine-learning
You can consider also using multiple metrics for your assessment, not just one, as each metric has a slightly different purpose.

Fnguyen · Answer

In general the most common metrics mentioned in several articles (e.g. like this one) are the same we also commonly use for non-time series prediction:

MAE Mean absolute error
MSE Mean squared error
RMSE Root mean squared error

Outside of linear regressions I have not seen R² used that often to validate prediction models. In fact it isn't even one of the out-of-the-box metrics of xgboost for interval scale prediction tasks. So you should be fine using RMSE for everything.
Edit:
Prophet at least in the R package covers RMSE, MAE and MAPE so you should be fine.

Best common metric for comparing classic time series forecasting methods (ARIMA/Prophet) with ML approach?

3 Answers

Add your own answers!

Ask a Question