TransWikia.com

Query relating to Pandas Rows manipulation

Data Science Asked by rwamit on January 2, 2021

I have a query regarding Pandas data manipulation.

Let’s say I have a dataframe, df with following structure.

A B C
1 1 7
5 3 3
3 3 2
7 5 2
5 NaN 2

We have 3 columns in the dataframe A, B & C.

B column consists of mean values wrt A.

For example,

Value of B in 3rd row (which is 3) is mean of first 3 rows of A (9/3)
Similarly, value of B in 4th row = (Sum of values in 2nd,3rd and 4th row of A)/3

Now, let’s say I have many NaN values in B and there are no NaN values in A, how do I write a function or code to fill the NaN values as per the logic discussed above?

I tried using loc and iloc but I guess I made some mistake.

2 Answers

Assuming you don't have NaNs in the first two entries of column B, the following code works

index_nan = df.index[df['B'].isna()] #get all indices where B has NaNs

new_df = pd.DataFrame({'B': [np.mean(df['A'][i-2:i+1]) for i in index_nan]}, index=index_nan) 

df.update(new_df) #update those values of column B in df

Correct answer by Namita on January 2, 2021

Thank you for the above answer! That definitely works. However, I found a more efficient way in terms of computation using np.rolling

df['D'] = df['A'].rolling(min_periods=1, window=3).mean()

df['B'] = np.where(df['B'].isnull,df['D'],df['B'])

  • np.rolling helps to compute the cumulative sum of previous n values.
  • np.where helps to apply some output based on a condition: syntax: np.where(condition, value if true, value if false).
  • Column D can be dropped once it is used.

Answered by rwamit on January 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP