TransWikia.com

Cumulative product of (1-previous_record)*current_record

Stack Overflow Asked on December 3, 2021

The data frame contains two variables (time and rate) and 10 observations

time <- seq(1:10) 
rate <- 1-(0.99^time)
dat <- data.frame(time, rate)

I need to add a new column (called new_rate).

new_rate is defined as follows

Note: new_rate_1 is the first observation of new the column new_rate, etc.

new_rate_1 = rate_1
new_rate_2 = (1-rate_1)*rate_2
new_rate_3 = (1-rate_1)*(1-rate_2)*rate_3
new_rate_4 = (1-rate_1)*(1-rate_2)*(1-rate_3)*rate_4
...
new_rate_10 = (1-rate_1)*(1-rate_2)*(1-rate_3)*(1-rate_4)*(1-rate_5)*(1-rate_6)*(1-rate_7)*(1-rate_8)*(1-rate_9)*rate_10

How this can be done in base R or dplyr?

3 Answers

A straightforward math approach using cumprod should work

> c(1, head(cumprod(1 - rate), -1)) * rate
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074

If you want to practice with recursions, you can try the method below

f <- function(v, k = length(v)) {
    if (k == 1) {
        return(v[k])
    }
    u <- f(v, k - 1)
    c(u, tail(u, 1) * (1 / v[k - 1] - 1) * v[k])
}

such that

> f(rate)
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074

Answered by ThomasIsCoding on December 3, 2021

In case you are still interested in how to do it with purrr::reduce family of functions. Here are two solutions:

  • In every iteration if you multiply the accumulated/previous value by (1/previous value of rate - 1) * (current value of rate) you will get the desired output in every row
library(purrr)

accumulate2(dat$rate[-nrow(dat)], dat$rate[-1], .init = dat$rate[1], 
           ~ ..1 * (1/..2 - 1) * ..3) %>%
  simplify()

 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049 0.05500858 0.05830607
 [9] 0.06022773 0.06083074

And also in base R we could do the following:

Reduce(function(x, y) {
  x * (1/dat$rate[y - 1] - 1) * dat$rate[y]
}, init = dat$rate[1], 
seq_len(nrow(dat))[-1], accumulate = TRUE)

 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049 0.05500858 0.05830607
 [9] 0.06022773 0.06083074

Answered by Anoushiravan R on December 3, 2021

cumprod to the rescue (hat-tip to @Cole for simplifying the code):

dat$rate * c(1, cumprod(1 - head(dat$rate, -1)))

The logic is that you are essentially doing a cumulative product of 1 - dat$rate, multiplied by the current step.
At the first step, you can just keep the existing value, but then you need to offset the two vectors so that the multiplication gives the desired result.

Proof:

out <- c(
dat$rate[1],
(1-dat$rate[1])*dat$rate[2],
(1-dat$rate[1])*(1-dat$rate[2])*dat$rate[3],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*dat$rate[4],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*dat$rate[5],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*dat$rate[6],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*dat$rate[7],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*dat$rate[8],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*dat$rate[9],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*(1-dat$rate[9])*dat$rate[10]
)

all.equal(
  dat$rate * c(1, cumprod(1 - head(dat$rate, -1))),
  out
)
#[1] TRUE

Answered by thelatemail on December 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP