TransWikia.com

Find the mode value and frequency in R

Data Science Asked by DataGuy23 on July 4, 2021

I’m trying to come up with a function in R that gives the mode value of a column along with the number of times (or frequency) that the value occurs. I want it to exclude missing (or blank) values, and treat ties by showing both values. When there are no repeating values I want it to return the first-appearing value that is found along with its frequency 1.

"Name         Color
 Drew         Blue
 Drew         Green
 Drew         Red
 Bob          Green
 Bob          Green
 Bob          Green
 Bob          Blue
 Jim          Red 
 Jim          Red
 Jim          blue
 Jim          blue

mode of Drew = Blue, 1
mode of Bob = Green, 3
mode of jim = Red, Blue, 2

Here’s the function code i have so far, it excludes NAs but does not show both values when there is a tie and does not show frequency. Any help appreciated!

mode <- function(x) {
if ( anyNA(x) ) x = x[!is.na(x)]
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

One Answer

You do not need a custom function to do this. Let dplyr handle it. Assuming your data is in a dataframe named df, here is what it might look like:

df %>%                                       # Set up the pipe
subset(complete.cases(df)) %>%               # Removes rows with NA values
group_by(Name) %>%                           # Groups by the Name column
count(Color) %>%                             # Counts each Color by Name, creates a new column n
mutate(max = max(n)) %>%                     # Creates a new column for the max(n) by Name
subset(n == max(n)) %>%                      # Keeps only those rows where n equals max(n)
mutate(Keep == case_when(                    # Creates a dummy logical column named 'Keep'
   n > 1 ~ TRUE,                             # That is TRUEfor n > 1 to keep ties
   n == 1 & Color == head(Color, 1) ~ TRUE,  # That is TRUE for the first row of n = 1
   TRUE ~ FALSE)) %>%                        # That is FALSE for all other cases
subset(Keep) %>%                             # Keeps only those rows where Keep is TRUE
select(Name, Mode = Color, n)                # Keeps only the Name, Color, and n columns and
                                             # renames Color as Mode

Here is the output

 # A tibble: 3 x 3
 # Groups:   Name [3]
   Name  Mode   Count
   <fct> <fct>  <int>
 1 Bob   Green      3
 2 Drew  Blue       1
 3 Jim   Blue       2
 4 Jim   Red        2

If you want a function, then wrap this up in a function definition:

my_mode_func <- function(df){
df %>% 
   subset(complete.cases(df)) %>%
   group_by(Name) %>%
   count(Color) %>%
   mutate(max = max(n)) %>%
   subset(n == max) %>%
   mutate(Keep = case_when(
      n > 1 ~ TRUE,
      n == 1 & Color == head(Color,1) ~ TRUE,
      TRUE ~ FALSE)) %>%
   subset(Keep) %>%
   select(Name, Mode = Color, Count = n)
}

Answered by Ben Norris on July 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP