TransWikia.com

How to sum pandas df rows where each cell contains a list?

Stack Overflow Asked by Alon Tru on December 11, 2021

I’m trying to sum my df’s rows as follows,
let’s say I have the beneath df (each cell in a row contains a vector/list of the same size!)
In the real problem, I have a large number of columns and it can vary. But I do have a list that contains the names of those columns.

df = pd.DataFrame([
    [[1,2,3],[1,2,3],[1,2,3]],
    [[1,1,1],[1,1,1],[1,1,1]],
    [[2,2,2],[2,2,2],[2,2,2]]
    ], columns=['a','b','c'])

I’m trying to create a new Column that will contain the sum of all the vectors in every row- as np.array would do! and get this following vectors as a result:

[3,6,9]
[3,3,3]
[6,6,6]

and not like the .sum(axis=1) does..

[1,2,3,1,2,3,1,2,3]
[1,1,1,1,1,1,1,1,1]
[2,2,2,2,2,2,2,2,2]

Can anyone think of an idea, thanks in advance 🙂

2 Answers

Another way using pd.Series.explode:

df['sum'] = df.apply(pd.Series.explode).sum(axis=1).groupby(level=0).agg(list)

Output:

           a          b          c              sum
0  [1, 2, 3]  [1, 2, 3]  [1, 2, 3]  [3.0, 6.0, 9.0]
1  [1, 1, 1]  [1, 1, 1]  [1, 1, 1]  [3.0, 3.0, 3.0]
2  [2, 2, 2]  [2, 2, 2]  [2, 2, 2]  [6.0, 6.0, 6.0]

Answered by Scott Boston on December 11, 2021

If same lengths of lists create numpy array and sum for improve performance:

df['Sum'] = np.array(df.to_numpy().tolist()).sum(axis=1).tolist()
print (df)
           a          b          c        Sum
0  [1, 2, 3]  [1, 2, 3]  [1, 2, 3]  [3, 6, 9]
1  [1, 1, 1]  [1, 1, 1]  [1, 1, 1]  [3, 3, 3]
2  [2, 2, 2]  [2, 2, 2]  [2, 2, 2]  [6, 6, 6]

Answered by jezrael on December 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP