Query relating to Pandas Rows manipulation

Question

I have a query regarding Pandas data manipulation.
Let's say I have a dataframe, df with following structure.
A B C
1 1 7
5 3 3
3 3 2
7 5 2
5 NaN 2

We have 3 columns in the dataframe A, B & C.
B column consists of mean values wrt A.
For example,
Value of B in 3rd row (which is 3) is mean of first 3 rows of A (9/3)
Similarly, value of B in 4th row = (Sum of values in 2nd,3rd and 4th row of A)/3
Now, let's say I have many NaN values in B and there are no NaN values in A, how do I write a function or code to fill the NaN values as per the logic discussed above?
I tried using loc and iloc but I guess I made some mistake.

Namita · Accepted Answer

Assuming you don't have NaNs in the first two entries of column B, the following code works
index_nan = df.index[df['B'].isna()] #get all indices where B has NaNs

new_df = pd.DataFrame({'B': [np.mean(df['A'][i-2:i+1]) for i in index_nan]}, index=index_nan)

df.update(new_df) #update those values of column B in df

rwamit · Answer

Thank you for the above answer!
That definitely works. However, I found a more efficient way in terms of computation using np.rolling
df['D'] = df['A'].rolling(min_periods=1, window=3).mean()
df['B'] = np.where(df['B'].isnull,df['D'],df['B'])

np.rolling helps to compute the cumulative sum of previous n values.
np.where helps to apply some output based on a condition: syntax: np.where(condition,
value if true, value if false).
Column D can be dropped once it is used.

Query relating to Pandas Rows manipulation

2 Answers

Add your own answers!

Ask a Question