How to sum pandas df rows where each cell contains a list?

Question

I'm trying to sum my df's rows as follows,
let's say I have the beneath df (each cell in a row contains a vector/list of the same size!)
In the real problem, I have a large number of columns and it can vary. But I do have a list that contains the names of those columns.
df = pd.DataFrame([
    [[1,2,3],[1,2,3],[1,2,3]],
    [[1,1,1],[1,1,1],[1,1,1]],
    [[2,2,2],[2,2,2],[2,2,2]]
    ], columns=['a','b','c'])

I'm trying to create a new Column that will contain the sum of all the vectors in every row- as np.array would do! and get this following vectors as a result:
[3,6,9]
[3,3,3]
[6,6,6]

and not like the .sum(axis=1) does..
[1,2,3,1,2,3,1,2,3]
[1,1,1,1,1,1,1,1,1]
[2,2,2,2,2,2,2,2,2]

Can anyone think of an idea, thanks in advance :)

Scott Boston · Answer

Another way using pd.Series.explode:
df['sum'] = df.apply(pd.Series.explode).sum(axis=1).groupby(level=0).agg(list)

Output:
           a          b          c              sum
0  [1, 2, 3]  [1, 2, 3]  [1, 2, 3]  [3.0, 6.0, 9.0]
1  [1, 1, 1]  [1, 1, 1]  [1, 1, 1]  [3.0, 3.0, 3.0]
2  [2, 2, 2]  [2, 2, 2]  [2, 2, 2]  [6.0, 6.0, 6.0]

jezrael · Answer

If same lengths of lists create numpy array and sum for improve performance:
df['Sum'] = np.array(df.to_numpy().tolist()).sum(axis=1).tolist()
print (df)
           a          b          c        Sum
0  [1, 2, 3]  [1, 2, 3]  [1, 2, 3]  [3, 6, 9]
1  [1, 1, 1]  [1, 1, 1]  [1, 1, 1]  [3, 3, 3]
2  [2, 2, 2]  [2, 2, 2]  [2, 2, 2]  [6, 6, 6]

How to sum pandas df rows where each cell contains a list?

2 Answers

Add your own answers!

Ask a Question