TransWikia.com

sklearn - SimpleImputer in an empty Pipeline

Data Science Asked on January 6, 2021

When building a Pipeline I’m ending up at a scenario that can be simplified like this:

FeatureUnion(NumericalPipeline(steps), CategoricalPipeline(steps))

Since this is one intermediary step in a larger Pipeline, I’m feeding the preceding inputs into both of these and select the corresponding dtypes within the Numerical and Categorical Pipelines.

For some datasets, however, no Categorical Columns are left leading the Pipeline to fail. I’ve tried returning an empty list and ‘None’ but both of these did not result in the Pipeline skipping the “empty” CategoricalPipeline.

After further investigation it turns out that the SimpleImputer() in the CategoricalPipeline causes the error. Depending on the order of steps the following messages are shown:

ValueError: Found array with 0 feature(s) (shape=(150, 0)) while a minimum of 1 is required.

ValueError: at least one array or dtype is required

Any ideas on how to pass the Imputer when no Column is present?

One Answer

All(?) the sklearn transformers do a check on input data (check_X_y), which includes a check for an empty dataframe. You could probably monkey-patch out that check, but that seems like overkill.

Instead, ColumnTransformer seems the way to go. Its main purpose fits your situation. It deals with an empty columns selector gracefully, by just not calling fit on that transformer:

transformers_ : list
The collection of fitted transformers... In case there were no columns selected, this will be the unfitted transformer.

Unless you're removing columns earlier in the pipeline? In that case, please provide that additional context.

Answered by Ben Reiniger on January 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP