Merging DataFrames with "uneven" data

Question

Excuse the title, I'm not even sure how to label what I'm trying to do. I have data in a DataFrame that looks like this:
Name     Month     Status
----     -----     ------
Bob      Jan       Good
Bob      Feb       Good
Bob      Mar       Bad
Martha   Feb       Bad
John     Jan       Good
John     Mar       Bad

Not every name 'Name' has every 'Month' and 'Status'. What I want to get is:
Name     Month     Status
----     -----     ------
Bob      Jan       Good
Bob      Feb       Good
Bob      Mar       Bad
Martha   Jan       N/A
Martha   Feb       Bad
Martha   Mar       N/A
John     Jan       Good
John     Feb       N/A
John     Mar       Bad

Where the missing months are filled in with a value in the 'Status' column.
What I've tried to do so far is export all of the unique 'Month" values to a list, convert to a DataFrame, then join/merge the two DataFrames.  But I can't get anything to work.
What is the best way to do this?

BENY · Answer

Do pivot
df=df.pivot(*df).stack(dropna=False).to_frame('Status').reset_index()
     Name Month Status
0     Bob   Feb  Good
1     Bob   Jan  Good
2     Bob   Mar   Bad
3    John   Feb   NaN
4    John   Jan  Good
5    John   Mar   Bad
6  Martha   Feb   Bad
7  Martha   Jan   NaN
8  Martha   Mar   NaN

cs95 · Answer

You can treat the month as a categorical column, then allow GroupBy to do the heavy lifting:
df['Month'] = pd.Categorical(df['Month'])
df.groupby(['Name', 'Month'], as_index=False).first()

Name Month Status
0     Bob   Feb   Good
1     Bob   Jan   Good
2     Bob   Mar    Bad
3    John   Feb    NaN
4    John   Jan   Good
5    John   Mar    Bad
6  Martha   Feb    Bad
7  Martha   Jan    NaN
8  Martha   Mar    NaN

The secret sauce here is that pandas treats missing "categories" by inserting a NaN there.
Caveat: This always sorts your data.

sammywemmy · Answer

You have to take advantage of Pandas' indexing to reshape the data :
Step1 : create a new index from the unique values of Name and Month columns :
new_index = pd.MultiIndex.from_product(
    (df.Name.unique(), df.Month.unique()), names=["Name", "Month"]
)

Step2 : set Name and Month as the new index, reindex with new_index and reset_index to get your final output :
df.set_index(["Name", "Month"]).reindex(new_index).reset_index()

UPDATE 2021/01/08:
You can use the complete function from pyjanitor; at the moment you have to install the latest development version from github:
 # install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import pyjanitor
df.complete("Name", "Month")

Merging DataFrames with "uneven" data

3 Answers

Add your own answers!

Ask a Question