TransWikia.com

Pandas Excel groupby/count

Stack Overflow Asked by Nathaniel on December 14, 2020

Hi I’m trying to have my script count the number of times it sees the same words in specified columns with some of those columns having multiple separated by a comma.

For example –

Labels                        Labs
a1, b3                         1
a2                             3
b3                             1

I would want two outputs.

Labels  # of labels
a1           1
b3           2


Labels       Lab1     Lab3
a1            1        0
a2            0        1
b3            2        0

I was trying to use groupby to count but the only output I am getting in excel is below and I am unable to know what they belong to

20
2
1
7
7

I have been playing with this but keep getting the same result shown above

df1 = df.groupby('Labs').count()

One Answer

Keys

  1. Convert the comma-separated string into lists first.
  2. Use df.explode() to expand the entries.
  3. Pivoted aggregation (to which concept that group size belongs) can be achieved by df.pivot_table().

Setup

df = pd.read_csv(io.StringIO("""
Labels                        Labs
a1, b3                         1
a2                             3
b3                             1
"""), sep=r"s{2,}", engine="python")

# split string into list (assume consistent separator pattern)    
df["Labels"] = df["Labels"].str.split(", ")

First output:

df.explode("Labels").groupby("Labels").size()

Out[69]:
Labels
a1    1
a2    1
b3    2
dtype: int64

Second output:

df.explode("Labels").pivot_table(index="Labels", columns="Labs", aggfunc="size")
    .fillna(0).astype(int)

Out[70]: 
Labs    1  3
Labels      
a1      1  0
a2      0  1
b3      2  0

Correct answer by Bill Huang on December 14, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP