What is the best way to train a gradient boosted model on a binomial dataset where the number of "observations" for each instance varies?

Question

I have been trying to figure out the best way to train a gradient boosted model on a binomial dataset. To be more clear my dataset is in a format similar to this:
[link to toy dataset].
https://i.stack.imgur.com/b8bRL.png
The Goal is to predict the probability of "success" (# of successes/# of trials) for each row (set of predictor variables).
The Problem I am having is I am not sure how to incorporate the # of trials (of each row) into the models. This is clearly important as we have much more confidence in instances with more trials (100 successes in 100 trials is more trustworthy than 1 success in 1 trial). I know that to deal with this, binomial linear regression treats each trial as an instance (instead of treating the sum of trials with the same predictor variables as an instance). I would like to try to mimic this approach in a gradient boosting framework.
My Solutions  The first approach I am considering is using an XGBoost regressor to predict the % of successes in each row. I would set the sample weights of each row equal to the number of trials that were in that row. Would this properly deal with the fact that we have increased confidence in rows with more trials?
My alternate approach is "expanding" the data so that each trial is its own row. So if one row had 5 successes out of 8 trials I would expand this into 8 rows (5 successes and 3 failures). I would then use a XGBoost classifier to predict the probability of success of each instance/trial. Since all of those rows that originally belonged to the same row would have the same predicted probability of success I would take this as the probability of success of the original row. This approach seems likely to work, but very inefficient.
Which of these approaches would better account for the fact that we have more confidence in instances with more observations? Are there any other better ways to deal with this in a gradient boosting framework?

What is the best way to train a gradient boosted model on a binomial dataset where the number of "observations" for each instance varies?

Add your own answers!

Ask a Question