TransWikia.com

Create many CSV Files based on Pandas df column value

Stack Overflow Asked by Eithar on November 7, 2021

I have one csv file with thousands of rows , below is example.
What I need to do is :

  1. Iterate over the pandas df
  • create separate file for every department with list of staff(groupby or .loc)
  • To name the file :
    If manager is found then name the file with his name
    if no manager , then look for officer
    if no manager or officer , then write a comment in the last column
    the file should be named as this( IT-name2.csv) as name2 staff is manager in the below picture/example.

So I have two variables , dept. name and staffname

I was able to do this but there is lots of manual work , it should not be the case. I name every csv file myself , I have 100s of lines and I add the managername in the csv filename myself , which caused some errors and it could be changed in the future

Now , for every department I have groupby line, and line to save the csv file (manually enter the managername in the filename)

How this can be more automated ?

Many thanks
.

enter image description here

2 Answers

I wasn't able to understand the complete requirement, posting an answer which should help:

Get a list of unique departments:

dept_list = list(set(df['Dept.'].tolist()))

Now we want to run through the unique only department list and do some manipulation of the dataframe:

for dept in dept_list:
    sub_df = df.loc[df['Dept.'] == dept]
    # We want to send this to a file. The file name should be dept-officer/manager/other name.csv
    # Check if manager exists in sub_df['Role']
    if 'Manager' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Manager'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
    elif 'Officer' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Officer'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))

Answered by Sid on November 7, 2021

It sounds like you're nearly there with the groupby. How about adding a custom function to modify the csv name depending on what you find in the groupby?

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(data={
    "Dept"    : np.random.choice(["IT", "HR", "Sales"], 20),
    "Staff"   : ["name" + num for num in np.random.randint(0,5,20).astype(str)],
    "Role"    : np.random.choice(a=["Manager", "Officer", "Admin"], size=20, p=[0.1, 0.3, 0.6]),
    "Comment" : [None] * 20
})

def to_csv(group):
    
    roles = group["Role"].tolist()
    dept = group["Dept"].iloc[0]
    staff_name = "NotFound"
    
    if "Manager" in roles:
        staff_name = group["Staff"].iloc[roles.index("Manager")]
    elif "Officer" in roles:
        staff_name = group["Staff"].iloc[roles.index("Officer")]
        
    group.to_csv(f"{dept}-{staf_name}.csv", index=False)

df.groupby("Dept").apply(to_csv)

list().index() will return the position of the first match which you can use to grab the name in that position from the group. It might not be the fastest thing in the world, but hopefully will get the job you have in mind done.

Answered by gherka on November 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP