TransWikia.com

Need help in grouping rows by year and differentiating months

Stack Overflow Asked on January 19, 2021

I have a dataframe that looks like this:

dataframe:

Date    Revenue   
2009      15       
dec       15       
2010      450       
jan       13       
feb       14       
mar       14       
apr       10       
may       10       
jun       31       
jul       99    
aug       43  
sep       87 
oct       32  
nov       54     
dec       43
2011      67

And it continues for several years in the same pattern until 2019. The row which contains the year represents the aggregate revenue for that year. 2009 is the only year which contains only one data point (december).

The dataframe is from a pivot table imported from excel that had months subgrouped for every year.

Each month is in the same column as the year and months from different years are not differentiated. I need to plot a line graph with monthly revenue for each year (that is, several lines for different years that show the revenue month by month), but the fact that I can’t differentiate months from different years is not allowing me to.

How can I make subgroups of months by year? Or assigning a new column with years for determined intervals (that is, every 12 rows), but excluding the year rows?

Thank you!

One Answer

I would suggest next approach formating your data, and completing values for year. Your data (I have defined as df the output you included) has the feature that Date variable has mixed numeric and character values. The code I added creates a new variable according to the type in order to extract the year. After that missing rows are filled to completely identify the year group. Finally, it is sketched the plot. You only have one value for 2009 so it can not be seen and for 2011 there is only information about total. With your entire data you will have the complete image of all years. Here a tidyverse approach:

library(tidyverse)
#Data
df <- structure(list(Date = c("2009", "dec", "2010", "jan", "feb", 
"mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", 
"dec", "2011"), Revenue = c(15L, 15L, 450L, 13L, 14L, 14L, 10L, 
10L, 31L, 99L, 43L, 87L, 32L, 54L, 43L, 67L)), class = "data.frame", row.names = c(NA, 
-16L))

The code:

#Code
df %>% mutate(Var=ifelse(is.na(as.numeric(Date)),NA,as.numeric(Date))) %>%
  fill(Var) %>%
  #filter years in date to exclude big totals
  filter(is.na(as.numeric(Date))) %>%
  #Add order to levels
  mutate(Date=factor(Date,levels = c("jan","feb","mar","apr","may",
                                     "jun","jul","aug","sep","oct",
                                     "nov","dec"),ordered=T)) %>%
  #Finally plot
  ggplot(aes(x=Date,y=Revenue,group=factor(Var),color=factor(Var)))+
  geom_line()+
  theme_bw()

Output:

enter image description here

Correct answer by Duck on January 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP