Hi I’m trying to have my script count the number of times it sees the same words in specified columns with some of those columns having multiple separated by a comma.

For example –

Labels                        Labs
a1, b3                         1
a2                             3
b3                             1

I would want two outputs.

Labels  # of labels
a1           1
b3           2

Labels       Lab1     Lab3
a1            1        0
a2            0        1
b3            2        0

I was trying to use groupby to count but the only output I am getting in excel is below and I am unable to know what they belong to


I have been playing with this but keep getting the same result shown above

df1 = df.groupby('Labs').count()

One Answer


  1. Convert the comma-separated string into lists first.
  2. Use df.explode() to expand the entries.
  3. Pivoted aggregation (to which concept that group size belongs) can be achieved by df.pivot_table().


df = pd.read_csv(io.StringIO("""
Labels                        Labs
a1, b3                         1
a2                             3
b3                             1
"""), sep=r"s{2,}", engine="python")

# split string into list (assume consistent separator pattern)    
df["Labels"] = df["Labels"].str.split(", ")

First output:


a1    1
a2    1
b3    2
dtype: int64

Second output:

df.explode("Labels").pivot_table(index="Labels", columns="Labs", aggfunc="size")

Labs    1  3
a1      1  0
a2      0  1
b3      2  0

Correct answer by Bill Huang on December 14, 2020

