Create combinations of numbers and get a count of distinct combinations using Python

Question

I have df1 which contains a set of particular IDs as a column and df2 which contains a mix of IDs in each row (Figure shown below). I want to create a data frame which contains all the different combinations of IDs in df1 present in each row of df2 and get a count of all the different combinations.
df1=pd.DataFrame({'Id':["181","456","235","653","987","5","300"]})
df2=pd.DataFrame({'Tag Id':["213,435,181,954,987","456","215,435,181,754,987","213,12,432,300,653,987"})

Akshay Sehgal · Answer

Here is a faster approach using list comprehensions and itertools -
import itertools

#Get vocab of items
vocab = list(df1['Id'].astype(int))

#get filtered list of combinations in each row of df2
filtered = [[int(j) for j in i.split(',') if int(j) in vocab] for i in list(df2['Tag Id'])]

#Get counts of the combinations and display as a dataframe 
counts = list(zip(*np.unique(filtered, return_counts=True)))
pd.DataFrame(counts, columns=['Combinations', 'Counts'])

Combinations    Counts
0   [181, 987]      2
1   [300, 653, 987] 1
2   [456]           1

Quang Hoang · Answer

Let's try explode to separate the Tag Ids in df1, then merge with df1 and count:
s = (df2['Tag Id'].str.split(',')
         .explode()
         .reset_index()
    )

(df1.merge(s, left_on='Id', right_on='Tag Id')
    .sort_values('Tag Id')
    .groupby('index')
    .agg(Combination=('Id',','.join))
    ['Combination']
    .value_counts().reset_index()
)

Output:
         index  Combination
0      181,987            2
1  653,987,300            1
2          456            1

Create combinations of numbers and get a count of distinct combinations using Python

2 Answers

Add your own answers!

Ask a Question