TransWikia.com

How to obtain the mean between values locate on the previous and next row in R?

Stack Overflow Asked on November 26, 2020

I have a dataframe on R with expenditure for many groups along the years. It basically looks like this (the grey columns):

Table

I want to add the mean of spending for the years, as shown on the yellow column, based on the spending on the previous and following year.

I have trying using this code:

expenditures %>%
 group_by(id) %>%
 mutate(
   avg_exp = ifelse(year != 2011 && year != 2008,
                        mean(c(
                          Spending[Year %in% (Year-1)],
                          Spending[Year %in% (Year+1)])),
                        NA)) %>%
 View()

However, i keep all sort of weird numbers. First of all, the ifelse only apply the else condition, even tho the Year column is set as integer. Second of all, even if i set to calculate the average also on the else condition, all rows (in each group) are filled with the same number, which I don’t know where it came from (it is close to the general average of the group but not the same).

Is there any simple way to do this?
Thanks

3 Answers

We could use the + of lag and lead and divide by 2 after grouping by 'ID'. The default option in both lead and lag are NA so, those first and last 'Year' will be NA in the 'Mean' column

library(dplyr)
expenditures %>% 
    group_by(ID) %>%
    mutate(Mean = (lead(Spending) + lag(Spending))/2)

-output

# A tibble: 12 x 4
# Groups:   ID [3]
#      ID  Year Spending   new
#   <int> <int>    <dbl> <dbl>
# 1     1  2008       55  NA  
# 2     1  2009       57  60  
# 3     1  2010       65  63.5
# 4     1  2011       70  NA  
# 5     2  2008       80  NA  
# 6     2  2009       87  85  
# 7     2  2010       90  91  
# 8     2  2011       95  NA  
# 9     3  2008      120  NA  
#10     3  2009      123 125  
#11     3  2010      130 129  
#12     3  2011      135  NA  

Or another option is to cbind the lead/lag output and then use rowMeans

expenditures %>%
   group_by(ID) %>%
   mutate(Mean = rowMeans(cbind(lead(Spending), lag(Spending))))

data

expenditures <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L, 
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65, 
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame",
row.names = c(NA, 
-12L))

Correct answer by akrun on November 26, 2020

For completion here is a data.table answer with shift :

library(data.table)

setDT(expenditures)
expenditures[, Mean := (shift(Spending) + shift(Spending, type = 'lead'))/2, ID]
expenditures

#    ID Year Spending  Mean
# 1:  1 2008       55    NA
# 2:  1 2009       57  60.0
# 3:  1 2010       65  63.5
# 4:  1 2011       70    NA
# 5:  2 2008       80    NA
# 6:  2 2009       87  85.0
# 7:  2 2010       90  91.0
# 8:  2 2011       95    NA
# 9:  3 2008      120    NA
#10:  3 2009      123 125.0
#11:  3 2010      130 129.0
#12:  3 2011      135    NA

Answered by Ronak Shah on November 26, 2020

Here is a base R option using embed within ave

transform(
  expenditures,
  Mean = ave(Spending,ID,FUN = function(x) c(NA,rowMeans(embed(x,3)[,-2]),NA))
)

which gives

   ID Year Spending  Mean
1   1 2008       55    NA
2   1 2009       57  60.0
3   1 2010       65  63.5
4   1 2011       70    NA
5   2 2008       80    NA
6   2 2009       87  85.0
7   2 2010       90  91.0
8   2 2011       95    NA
9   3 2008      120    NA
10  3 2009      123 125.0
11  3 2010      130 129.0
12  3 2011      135    NA

Data

> dput(expenditures)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L,
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65,
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame", row.names = c(NA, 
-12L))

Answered by ThomasIsCoding on November 26, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP