TransWikia.com

RF regressor for probabilites

Data Science Asked by Wacken0013 on August 10, 2021

I am using sklearn multioutput RF regressor to learn statistics in my data. So my target contains several probabilities for the different features, and the sum of all these probabilities is one as they are fractions of how often the feature occurs.
The RF actually learns this property even though I have not enforced anywhere that the outputs should sum to one. I have also added a constant to my targets and the RF then learn that the outputs should sum to one plus that constant, so it is not some normalization.
I’m pretty sure I know how an RF regressor works but I cant explain how it can learn such metafeatures of my data. I would have expected the sum of my output to be somewhere around 1, not always exactly one.
Any ideas?

One Answer

This is indeed expected behavior, because of the way tree models handle multioutput problems. The nodes contain some number of samples, and the score for each output is the average of those samples' corresponding output. Since averaging commutes with sums, the property of summing to 1 is preserved. I'm not sure if this will help, but in symbols:

$$ sum_{text{output }i} p_i = sum_{text{output }i} left(operatorname*{avg}_{text{sample }j}(p_i^j)right) = operatorname*{avg}_j left(sum_{text{output }i} p_i^jright) = operatorname*{avg}_j 1 = 1.$$

Then for the entire forest, you're just applying another averaging, and so the property is again maintained.

Correct answer by Ben Reiniger on August 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP