TransWikia.com

Add column taking difference of values in one column grouped by other column

Stack Overflow Asked by anntree on January 23, 2021

I have a df called diff_colour_valid_int1:

> head(diff_colour_valid_int1)
# A tibble: 6 x 5
# Groups:   search_difficulty, cue_validity [3]
  search_difficulty cue_validity cue_colour           meanrt stdev
  <fct>             <fct>        <fct>                 <dbl> <dbl>
1 difficult         FALSE        Match (Color) cue     0.990 0.158
2 difficult         FALSE        Mismatch (Onset) cue  0.972 0.150
3 difficult         TRUE         Match (Color) cue     0.828 0.133
4 difficult         TRUE         Mismatch (Onset) cue  0.881 0.177
5 easy              FALSE        Match (Color) cue     0.813 0.132
6 easy              FALSE        Mismatch (Onset) cue  0.801 0.137
> 

I want to add a column called cue_effect that calculates the difference between the meanrt values for each cue_validity pair (e.g. the first two FALSE FALSE). So the first six values of the column would be:

cue_effect
<dbl>
0.018
0.018
-0.053
-0.053
0.012

Any suggestions are appreciated. Thanks.

2 Answers

We can use rleid to create a grouping column

library(dplyr)
library(data.table)
diff_colour_valid_int1 %>%
    group_by(search_difficulty, grp = rleid(cue_validity)) %>%
    mutate(cue_effect = -diff(meanrt))

-output

# A tibble: 6 x 7
# Groups:   search_difficulty, grp [3]
#  search_difficulty cue_validity cue_colour           meanrt stdev   grp cue_effect
#  <chr>             <lgl>        <chr>                 <dbl> <dbl> <int>      <dbl>
#1 difficult         FALSE        Match (Color) cue     0.99  0.158     1     0.018 
#2 difficult         FALSE        Mismatch (Onset) cue  0.972 0.15      1     0.018 
#3 difficult         TRUE         Match (Color) cue     0.828 0.133     2    -0.053 
#4 difficult         TRUE         Mismatch (Onset) cue  0.881 0.177     2    -0.053 
#5 easy              FALSE        Match (Color) cue     0.813 0.132     3     0.0120
#6 easy              FALSE        Mismatch (Onset) cue  0.801 0.137     3     0.0120

data

diff_colour_valid_int1 <- structure(list(search_difficulty = c("difficult", "difficult", 
"difficult", "difficult", "easy", "easy"), cue_validity = c(FALSE, 
FALSE, TRUE, TRUE, FALSE, FALSE), cue_colour = c("Match (Color) cue", 
"Mismatch (Onset) cue", "Match (Color) cue", "Mismatch (Onset) cue", 
"Match (Color) cue", "Mismatch (Onset) cue"), meanrt = c(0.99, 
0.972, 0.828, 0.881, 0.813, 0.801), stdev = c(0.158, 0.15, 0.133, 
0.177, 0.132, 0.137)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Correct answer by akrun on January 23, 2021

You can use cumsum with lag to create a new group everytime there is change in cue_validity value and calculate the mean in each group.

library(dplyr)

diff_colour_valid_int1 %>%
  group_by(search_difficulty, 
           group = cumsum(cue_validity != lag(cue_validity, 
                   default = first(cue_validity)))) %>%
  mutate(cue_effect = na.omit(lag(meanrt) - meanrt)) %>%
  ungroup() %>%
  select(-group)

#  search_difficulty cue_validity cue_colour           meanrt stdev cue_effect
#  <chr>             <lgl>        <chr>                 <dbl> <dbl>      <dbl>
#1 difficult         FALSE        Match (Color) cue     0.99  0.158     0.018 
#2 difficult         FALSE        Mismatch (Onset) cue  0.972 0.15      0.018 
#3 difficult         TRUE         Match (Color) cue     0.828 0.133    -0.053 
#4 difficult         TRUE         Mismatch (Onset) cue  0.881 0.177    -0.053 
#5 easy              FALSE        Match (Color) cue     0.813 0.132     0.0120
#6 easy              FALSE        Mismatch (Onset) cue  0.801 0.137     0.0120

Answered by Ronak Shah on January 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP