TransWikia.com

LightGBM - Why Exclusive Feature Bundling (EFB)?

Data Science Asked on August 27, 2021

I’m currently studying GBDT and started reading LightGBM’s research paper.

In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles, treating them as a single feature. The researchers emphasize the fact that one must be able to retrieve the original values of the features from the bundle.

Question: If we have a categorical feature that has been one-hot encoded, won’t this algorithm simply reverse the one-hot encoding to a numeric encoding, thereby cancelling all the benefits of our previous encoding? (suppression of hierarchy between categories etc.)

3 Answers

See this article for a bit more detail on how to better explain EFB. Here is a brief visual explanation from there with my own edits. I hope you can appreciate the high production quality of my updated graphic...

enter image description here

To answer your main question see "Part 1 of EFB". This explains that features are ordered by their sparsity and mixed in with all other features. Though not impossible, it seems there is a small chance of them recombining perfectly, given you have enough other features.

There is a good amount of further information on EFB here. The main points are that:

  1. There is a threshold for approximately zero which is total_sample_cnt / 10000. See here.
  2. It also works on "non-zero sparse" data by using the most frequent bin as a reference point (in the same way sparse data uses zero as a reference point). See here.

There is a good talk by James Lamb, one of the core Devs. The references to EFB are here and here. The latter is the exact question you asked.

FYI - You can turn this off using enable_bundle=false, if you wish to experiment with it.

For more information on why you shouldn't use One-Hot-Encoding in Trees, and why it might be a bad thing that it recombines into Numeric/Ordinal Encoding, see this excellent blog here. Also, see my answer here for more details on binary encoding and for way to quickly test it out compared to other types. Finally, you might be best off using categorical_feature parameter anyway!

Correct answer by josh on August 27, 2021

I've read that paper so many times before in so many ways. What I can say on the matter is that the paper does not describe explicitly what the framework particularly does. It just gives an hint of their intuitive idea of bundling of the features in an efficient way. But specificly, it does not say that it does a 'reversion of one-hot-encoding' in particular to your question.

I tried giving categorical inputs directly and as one-hot-encoded to compare the time that it takes to compute. There was a significant difference: giving directly was all better in multiple datasets compared to giving as one-hot-encoded.

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

But still, I do not think that EFB will reverse one-hot-encoding since EFB is explained as a unique way of treating the categorical features. But it possibly 'bundles the unbundled features' when treating one-hot-encoded inputs.

I used the word 'probably' so much times out of implicitness of the paper. What I can advice to you is that to send an e-mail to one of the authors of the paper, I do not think that they would refuse to explain it. Or if you are brave, go for the GitHub Repo of LightGBM, to check the codes by yourself. I hope that I could give you an insight. If you come up with an exact answer on the matter, please let me know. Please do not hesitate to further discuss this, I'll be around. Good luck, have fun!

Answered by Ugur MULUK on August 27, 2021

From what the paper describes, EFB serves to speed up by reducing number of features. I think it is not saying there is no other effects. Of course whether other 'effects' are real concerns is another question.

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

Answered by Raymond Kwok on August 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP