TransWikia.com

How to combine and separate test and train data for data cleaning?

Data Science Asked on July 24, 2021

I am working on an ML model in which I have been provided the data in 2 files test.csv and train.csv. I want to perform data cleaning on both files together be concatenating them and then separating them.

I know how to concatenate 2 dataframes, but after data cleaning how will I separate the two files? Please help me complete the code.

CODE

test = pd.read_csv('test.csv')
train = pd.read_csv('train.csv')

df = pd.concat([test, train])

//Data Cleaning steps

//Separating them back to train and test set for providing input to model

3 Answers

There are several methods to choose from. If you insist on concatenating the two dataframes, then first add a new column to each DataFrame called source. Make the value for test.csv 'test' and likewise for the training set.

When you have finished cleaning the combined df, then use the source column to split the data again.

An alternative method is to record all the operations you perform on the training set and simply repeat for the test set. This won't work it you normalise values based on the population.

Answered by fswings on July 24, 2021

Method 1: Develop a function that does a set of data cleaning operation. Then pass the train and test or whatever you want to clean through that function. The result will be consistent.

Method 2: If you want to concatenate then one way to do it is add a column "test" for test data set and a column "train" for train data set. Perform you operation then use python split to again divide it into 2 dataframe

data[data['type']=="test"]

Answered by Amar nayak on July 24, 2021

Add an indicator column while concatenating the two dataframes, so you can later seperate them again:

df = pd.concat([test.assign(ind="test"), train.assign(ind="train")])

Then later you can split them again:

test, train = df[df["ind"].eq("test")], df[df["ind"].eq("train")]

Answered by Erfan on July 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP