squared error vs absolute error in boosting trees for regression

Data Science Asked by Cheng Qian on July 25, 2021

From the Element of Statistical Machine learning book, it says:

Using loss criteria such as the absolute error or the Huber loss (10.23) in
place of squared-error loss for regression, and the deviance (10.22) in place
of exponential loss for classification, will serve to robustify boosting trees.
Unfortunately, unlike their nonrobust counterparts, these robust criteria
do not give rise to simple fast boosting algorithms.

My question is why absolute error leads to a slower boosting algorithm? can’t we do the same thing using absolute error as when using squared error except for changing the power?

It also mentions squared error simplifies things because the new tree is just fitting the current error. But isn’t it the same for absolute error?

Note: this is in a boosting setting and not gradient boosting

boosting error handling

Add your own answers!

Ask a Question

Get help from others!