Data Science Asked by claudius on October 30, 2020
Let success metric(for some business use case I am working on) be a continuous random variable S.
The mean of pdf defined on S indicates the chance of success. Higher the mean more is the chance of success. Let std dev of pdf defined on S indicates risk. Lower the std deviation lower the risk of failure.
I have data,let’s call them X, which affects S. Let X also be modelled as bunch of random variables.
P(S|X) changes based on X.
The problem statement is I want to pick Xs such that P(S|X) has mean higher than P(S) and std deviation lower than P(S).
Just to illustrate my point I have taken X of 1 dimension.
Scatter plot between X(horizontal) and Y(on vertical):
You can see that P(S|X) changes for different values of X as given in the below plot:
For X between 4500 and 10225, mean of S is 3.889 and std dev is 0.041 compared to mean of 3.7 and std dev of 0.112 when there is no constraint on X.
What I am interested in is given S and bunch of Xs… pick range of Xs such that resulting distribution of P(S|X) has higher mean and lower standard deviation… Please help me find a standard technique that would help me achieve this.
Also I don’t want to condition on X such that number of samples are too small to generalise.I want to avoid cases such as on the left most side of tree where number of samples is 1.
Just apply an optimization to search for the X values that satisfy the criteria you're looking for. Here's a simple demo:
set.seed(123)
mu_x_true = 1e4
mu_y_true = 3.75
n = 1e2
x <- rpois(n, mu_x_true)
y <- rnorm(n, sqrt(mu_y_true))^2
plot(x, y)
# conditions:
# E[Y|X] > E[Y]
# std(Y|x) < std(Y)
mu_y_emp = mean(y)
sd_y_emp = sd(y)
objective <- function(par, alpha=0.5){
if (par[1]>par[2]) par = rev(par)
ix <- which((par[1] < x) & (x < par[2]))
k <- length(ix)
if (k==0) return(1e12)
mu_yx <- mean(y[ix])
sd_yx <- sd(y[ix])
alpha*(mu_y_emp - mu_yx) + (1-alpha)*(sd_yx - sd_y_emp)
}
init <- mean(x) + c(-sd(x), sd(x))
test <- optim(objective, par=init)
ix <- which((par[1] < x) & (x < par[2]))
mean(y[ix]) > mean(y)
# TRUE
sd(y) > sd(y[ix])
# TRUE
Answered by David Marx on October 30, 2020
0 Asked on June 21, 2021 by molse
1 Asked on June 21, 2021 by nku
2 Asked on June 21, 2021 by michaelrazum
2 Asked on June 21, 2021 by duhaime
computer vision convolutional neural network image classification machine learning neural network
0 Asked on June 21, 2021 by faizi
dataset image classification image preprocessing normalization preprocessing
1 Asked on June 21, 2021
0 Asked on June 21, 2021 by rachithr
1 Asked on June 20, 2021 by malyada-n
0 Asked on June 20, 2021 by mkerrig
computer vision feature engineering feature extraction nlp web scraping
1 Asked on June 20, 2021 by don-draper
0 Asked on June 20, 2021 by snowflakebladerunner
2 Asked on June 20, 2021 by the-dan
1 Asked on June 20, 2021 by nick-smith
0 Asked on June 19, 2021 by mara
Get help from others!
Recent Answers
© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP