TransWikia.com

Spline regression with many features in R

Cross Validated Asked by user2117258 on December 29, 2021

I have high-dimensional data that I’d like to fit a spline to then predict values given a held out set. I am currently fitting a linear regression model on my data via the glmnet R package:

cv_fit <- cv.glmnet(x = X_train, 
                    y = Y_train, 
                    alpha = 0, 
                    family = "gaussian", 
                    nfolds = 10, 
                    parallel = TRUE, 
                    type.measure = "mse") 

Here, my X_train contains approximately 70 data points with 10000 features, and my Y_train is a vector of 70 response variables. This model works to some degree on the evaluation and test sets but we think these data can be better modeled with some sort of polynomial regression. I came upon Spline Regression and I think this could be a good alternative. From the example I see online, I’ve only seem splines on 2-dimensional datasets and was curious to know if there was any sort of way to model these high-dimensional data with splines?

Any insight would be greatly appreciated!

One Answer

You could try using the bs function (which stands for "B-spline basis") from the splines package.

Suppose one of the variables in your data is called z, and you want to use a B-spline with 6 degrees of freedom. Then you can do the following:

library(splines)
splined <- bs(X_train$z, df = 6)
dfSplined <- as.data.frame(splined)
X_train$z <- NULL
X_train <- cbind(X_train, dfSplined)

Now you can use glmnet with the newly created X_train.

Answered by Willem on December 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP