TransWikia.com

R First Row By Group When Condition Is Met

Stack Overflow Asked by bvowe on February 16, 2021

dataHAVE=data.frame(STUDENT=c(1,1,1,2,2,2,3,3,3),
SCORE=c(0,1,1,5,1,2,1,1,1),
CAT=c(3,10,7,4,5,0,4,5,1),
FOX=c(5,0,10,8,9,1,8,9,0))

dataWANT=data.frame(STUDENT=c(1,2,3),
SCORE=c(1,1,1),
CAT=c(10,5,4),
FOX=c(0,9,8))

I have ‘dataHAVE’ and want ‘dataWANT’ which takes the first row for every ‘STUDENT’ when ‘SCORE’ equals to 1. I am seeking a data.table solution because of it being a large data. I try this but do not know how to set the criteria for ‘SCORE’

dataWANT[,.SD[1],by = key(STUDENT)]

4 Answers

Convert the 'data.frame' to 'data.table' (setDT), grouped by 'STUDENT', specify the logical condition in i, get the index of the first row (.I[1]), extract that column ($V1) and subset the rows

library(data.table)
setDT(dataHAVE)[dataHAVE[SCORE == 1, .I[1], STUDENT]$V1]

.I returns row index. If we don't have a grouping column, it would return a vector i.e.

setDT(dataHAVE)[SCORE == 1, .I]
#[1] 1 2 3 4 5 6

when we provide the grouping column, by default, the .I returns with a named column V1 (we could override it by changing the name)

setDT(dataHAVE)[SCORE == 1, .(colindex = .I[1]), STUDENT]
#    STUDENT colindex
#1:       1        2
#2:       2        5
#3:       3        7

Nowe, we have two columns, 'STUDENT', 'colindex'. We are specifically interested in the 'colindex', so extract with standard procedures ($ or [[) and then use that as row index in i

i1 <- setDT(dataHAVE)[SCORE == 1, .(colindex = .I[1]), STUDENT]$colindex
i1
#[1] 2 5 7

This we use for subsetting

dataHAVE[i1]

Correct answer by akrun on February 16, 2021

You could use match to get 1st row where SCORE = 1 for each STUDENT.

library(data.table)

setDT(dataHAVE)
dataHAVE[, .SD[match(1, SCORE)], STUDENT]

#   STUDENT SCORE CAT FOX
#1:       1     1  10   0
#2:       2     1   5   9
#3:       3     1   4   8

Answered by Ronak Shah on February 16, 2021

Solution 1. There is a straightforward and comprehensive solution in two lines:

dataWANT <- dataHAVE[dataHAVE$SCORE == 1,]            #Filter score equals to 1
dataWANT <- dataWANT[!duplicated(dataWANT$STUDENT), ] #Remove duplicated students

Solution 2. However, if you prefer to solve in one line:

dataWANT <- dataHAVE[!duplicated(paste0(dataHAVE$STUDENT, dataHAVE$SCORE)) & dataHAVE$SCORE ==1, ]

That creates a logical vector showing which of the combinations that are not duplicated of preceding elements, and combine it with a test if 'SCORE' is 1.

Answered by Fernando Barbosa on February 16, 2021

Here is a base R option using subset + ave

subset(
  dataHAVE,
  ave(SCORE==1, STUDENT, FUN = function(x) seq_along(x) == min(which(x)))
)

which gives

  STUDENT SCORE CAT FOX
2       1     1  10   0
5       2     1   5   9
7       3     1   4   8

Answered by ThomasIsCoding on February 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP