TransWikia.com

R simple dplyr solution to filter

Stack Overflow Asked by Triss on February 11, 2021

I need a simple dplyr solution to filter a data.frame. For example I have

set.seed(100)
x = sort(sample(1:5,10,1))
y = sort(sample(6:10,10,1))

z = as.data.frame(cbind(x,y))

z
   x y
1  1 6
2  2 7
3  2 7
4  2 8
5  2 8
6  3 8
7  3 9
8  4 9
9  4 9
10 5 9

As first, I need an output which extract the doubled values like this:

one = rbind(c(1,6),c(2,7), c(2,8), c(3,8), c(3,9), c(4,9), c(5,9))
one
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    2    8
[4,]    3    8
[5,]    3    9
[6,]    4    9
[7,]    5    9
two = rbind(c(1,6),c(2,7),c(3,8),c(4,9),c(5,9))
two 

Then I want to have unique x with for example the first entry like this:

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5    9

In a last step I need the number of different values of y in x:

three = rbind(c(1,1),c(2,2),c(3,2),c(4,2),c(5,1))
     [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    2
[4,]    4    2
[5,]    5    1

3 Answers

Hi you can achieve this with the following code:

  1. Get distinct x,y
z %>% distinct(.keep_all=TRUE)
  1. Group by x, get row number per group and filter the first row with x,y
z %>% 
group_by(x) %>%
mutate(row=row_number()) %>%
filter(row==1) %>%
select(-row)
  1. Group by x and count distinct y values
z %>% 
  group_by(x) %>%
  summarise(values=n_distinct(y,na.rm=TRUE))

Answered by Gerardo Flores on February 11, 2021

Both of these can be achieved using distinct and group_by. The first, group by both x and y and run distinct to get all unique combinations of x and y.

For the second, we group only by x and run distinct again. We have to include .keep_all to make sure y stays in the resulting dataframe because it isn't contained in the group_by. This works because distinct keeps the first occurring record for x. To clarify check ?distinct.

set.seed(100)
x = sort(sample(1:5,10,1))
y = sort(sample(6:10,10,1))

z = as.data.frame(cbind(x,y))

# First scenario
z1 <- z %>%
  group_by(x, y) %>%
  distinct(x)

z1 output:

# A tibble: 7 x 2
# Groups:   x, y [7]
      x     y
  <int> <int>
1     1     6
2     2     7
3     2     8
4     3     8
5     3     9
6     4     9
7     5     9
# Second scenario
z2 <- z %>%
  group_by(x) %>%
  distinct(x, .keep_all = TRUE)

z2 output:

# A tibble: 5 x 2
# Groups:   x [5]
      x     y
  <int> <int>
1     1     6
2     2     7
3     3     8
4     4     9
5     5     9

Answered by TTS on February 11, 2021

  1. distinct(z, x, y)
  2. group_by(z, x) %>% slice(1) %>% ungroup()
  3. group_by(z, x) %>% summarize(count = n())

Answered by bcarlsen on February 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP