TransWikia.com

How to restrict the columns to be passed to final classifier in PMML Pipeline

Data Science Asked by Akshay Tilekar on August 27, 2021

I am working on building XGBoost PMML using SKLearn and SKLearn2PMML.
I am having some numerical,somecategorical and datetime columns from which i am creating new feature inside the pipeline. When i am trying to train the model, it gets failed as the original categorical features also gets passed to the final classfier by default. Is there any way to restrict the features by specifying the feature names ?

One Answer

After digging down too much and some help from sklearn2pmml creator, I managed to filter the final columns to be passed to the classifier.

Note : Here recorder is DataFrameMapper object.

1.Getting categorical column indexes.

cat_cols = [recorder.transformed_names_.index(c) for c in categoricalCols if c in recorder.transformed_names_]

2.Adding ColumnTransformer to filter those column with the help of their indexes.

pipeline = PMMLPipeline([
    ("mapper", recorder),
    ("select", ColumnTransformer([("drop", "drop", cat_cols)], remainder='passthrough')),
    ("classifier", xgb.XGBClassifier())
])

3.Fitting the Data to the pipeline.

pipeline.fit(X_train,y_train)

4.Creating PMML file out of Pipeline.

out_file = "XGBoost.pmml"
sklearn2pmml(pipeline, out_file)

Correct answer by Akshay Tilekar on August 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP