TransWikia.com

How should I convert cumulative data table to current values in R?

Stack Overflow Asked by youtube on January 27, 2021

I have a long data table that provides cumulative values only. What would be the best way to add another column that has the current values? Here is a short data table you can use as an example:

   ContractID       Date  Cum_Sum_1M
1:          1 2018-02-01             10
2:          1 2018-02-20             30
3:          1 2018-03-12             50
4:          2 2018-02-01             10
5:          2 2018-02-12             30

2 Answers

Try this solution with diff() and a vector in order to get values before cumulative sum. Here the code:

#Code
df$Var <- c(df$Cum_Sum_1M[1],diff(df$Cum_Sum_1M))
df$CumVar2 <- cumsum(df$Var)

Output:

   ContractID       Date Cum_Sum_1M Var CumVar2
1:          1 2018-02-01         10  10      10
2:          1 2018-02-20         30  20      30
3:          1 2018-03-12         50  20      50
4:          2 2018-02-01         10 -40      10
5:          2 2018-02-12         30  20      30

Some data used:

#Data
df <- structure(list(ContractID = c(1L, 1L, 1L, 2L, 2L), Date = c("2018-02-01", 
"2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Cum_Sum_1M = c(10L, 
30L, 50L, 10L, 30L)), row.names = c("1:", "2:", "3:", "4:", "5:"
), class = "data.frame")

Also if a grouped operation is required, we could use dplyr:

library(dplyr)
#Code
df %>% group_by(ContractID) %>%
    mutate(NewVar=c(Cum_Sum_1M[1],diff(Cum_Sum_1M)))

Output:

# A tibble: 5 x 4
# Groups:   ContractID [2]
  ContractID Date       Cum_Sum_1M NewVar
       <int> <chr>           <int>  <int>
1          1 2018-02-01         10     10
2          1 2018-02-20         30     20
3          1 2018-03-12         50     20
4          2 2018-02-01         10     10
5          2 2018-02-12         30     20

Correct answer by Duck on January 27, 2021

As it is a data.table, the best option would be data.table methods. We group by 'ContractID' and take the difference of the lag and current values of 'Cum_Sum_1M' column

library(data.table)
dt[, Var := c(first(Cum_Sum_1M), (Cum_Sum_1M - shift(Cum_Sum_1M))[-1]), by = ContractID]
dt
#   ContractID       Date Cum_Sum_1M Var
#1:          1 2018-02-01         10  10
#2:          1 2018-02-20         30  20
#3:          1 2018-03-12         50  20
#4:          2 2018-02-01         10  10
#5:          2 2018-02-12         30  20

data

dt <- structure(list(ContractID = c(1L, 1L, 1L, 2L, 2L), Date = c("2018-02-01", 
"2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Cum_Sum_1M = c(10L, 
30L, 50L, 10L, 30L)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

Answered by akrun on January 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP