R simple dplyr solution to filter

Question

I need a simple dplyr solution to filter a data.frame. For example I have
set.seed(100)
x = sort(sample(1:5,10,1))
y = sort(sample(6:10,10,1))

z = as.data.frame(cbind(x,y))

z

x y
1  1 6
2  2 7
3  2 7
4  2 8
5  2 8
6  3 8
7  3 9
8  4 9
9  4 9
10 5 9

As first, I need an output which extract the doubled values like this:
one = rbind(c(1,6),c(2,7), c(2,8), c(3,8), c(3,9), c(4,9), c(5,9))
one

[,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    2    8
[4,]    3    8
[5,]    3    9
[6,]    4    9
[7,]    5    9

two = rbind(c(1,6),c(2,7),c(3,8),c(4,9),c(5,9))
two

Then I want to have unique x with for example the first entry like this:
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5    9

In a last step I need the number of different values of y in x:
three = rbind(c(1,1),c(2,2),c(3,2),c(4,2),c(5,1))

[,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    2
[4,]    4    2
[5,]    5    1

Gerardo Flores · Answer

Hi you can achieve this with the following code:

Get distinct x,y

z %>% distinct(.keep_all=TRUE)

Group by x, get row number per group and filter the first row with x,y

z %>% 
group_by(x) %>%
mutate(row=row_number()) %>%
filter(row==1) %>%
select(-row)

Group by x and count distinct y values

z %>% 
  group_by(x) %>%
  summarise(values=n_distinct(y,na.rm=TRUE))

TTS · Answer

Both of these can be achieved using distinct and group_by. The first, group by both x and y and run distinct to get all unique combinations of x and y. For the second, we group only by x and run distinct again. We have to include .keep_all to make sure y stays in the resulting dataframe because it isn't contained in the group_by. This works because distinct keeps the first occurring record for x. To clarify check ?distinct. set.seed(100) x = sort(sample(1:5,10,1)) y = sort(sample(6:10,10,1)) z = as.data.frame(cbind(x,y)) # First scenario z1 <- z %>% group_by(x, y) %>% distinct(x) z1 output: # A tibble: 7 x 2 # Groups: x, y [7] x y 1 1 6 2 2 7 3 2 8 4 3 8 5 3 9 6 4 9 7 5 9 # Second scenario z2 <- z %>% group_by(x) %>% distinct(x, .keep_all = TRUE) z2 output: # A tibble: 5 x 2 # Groups: x [5] x y 1 1 6 2 2 7 3 3 8 4 4 9 5 5 9

bcarlsen · Answer

distinct(z, x, y)
group_by(z, x) %>% slice(1) %>% ungroup()
group_by(z, x) %>% summarize(count = n())

R simple dplyr solution to filter

3 Answers

Add your own answers!

Ask a Question