TransWikia.com

Is autocorrelation of residuals a problem in machine learning?

Data Science Asked on March 28, 2021

Let’s assume I have a random forest model and the residuals of the model are autocorrelated. Is this a problem?

As an example, let’s assume I have two different random forest models, A and B, with a similar predictive performance. The residuals of model A are less autocorrelated than the residuals of model B. Should I prefer model A?

3 Answers

Yes, autocorrelation in residuals is a problem, but this is essentially because it is a clear illustration that there was more learnable information in the process you are modelling but your model missed it.

In the unlikely event that you have two equally performant models but one shows significant autocorrelation (you can test for this using the Durbin-Watson test as suggested in Noah Weber’s answer), this suggests neither model is working as well as we might hope (the autocorrelated model has failed to predict some predictable patterns and the other model is failing in some other way as its predictive power isn’t any better).

If you have two models that have different residuals but both are beating a naïve baseline, you’ve probably got models that will ensemble well.

Correct answer by Nicholas James Bailey on March 28, 2021

Choose model A, if autocorrelation is significant

residuals="mistakes in predictions" should be completely random, i.e. follow White noise. Now if something is significantly autocorrelated it wont be truly random and the independent error model is incorrect and it wont be a robust variance estimator. Prefer model A

How to measure significant autocorrelation? Durbin–Watson test

Answered by Noah Weber on March 28, 2021

If you fit a model and find a meaningful signal in the residuals, you should engineer more or better features to capture that signal.

A specific example is "Neglecting spatial autocorrelation causes underestimation of the error of sugarcane yield models" by Ferraciolli et al which found:

We showed that assuming independence when modeling yield leads to underestimating model errors and overfit …

They then changed feature selection process to reduce those errors.

Answered by Brian Spiering on March 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP