TransWikia.com

Is there a fundamental difference from creating a model for each value in a category?

Data Science Asked by Alex Dore on April 15, 2021

I am creating a few models based on service requests. The services being requested are not distributed equally, some services being used sparingly, whereas others are quite common.

I had these services as categorical variables and built pipelines to incorporate them through one-hot encoding. I got to thinking that it may make more sense to train a model per service(at least for the common ones). Or does it make more sense to lump in the less common ones in a special category?

I am struggling with the regression model, coming in at 0.41 for my R2 value.

One Answer

Is there a fundamental difference from creating a model for each value in a category?

Yes there is.

If a model is trained for each specific value of a variable (a category), then only the subset of data for this category can be used to train and test the model. As a consequence each model has a smaller number of instances to be trained from. Consequences:

  • In the case of a small category, there might not be enough instances to obtain a reliable model.
  • Every model is independent. This can be good or bad depending on whether this independence is also true in the data or not, or to what extent:
    • If the features behave in a completely different way depending on the category, then it's better to create individual models since each can really exploit the specific patterns for this category.
    • If the features have a very similar behavior across the categories, then independent models by category would potentially lose a lot of information.

In conclusion the choice often depends on:

  • How much data is available for each category.
  • How independent are the other features with respect to the category.

Correct answer by Erwan on April 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP