TransWikia.com

Low memory error while performing degree 2 polynomial regression on (3000*1835) sized array

Data Science Asked by V K on September 25, 2021

I am working on a problem to predict the revenue, a film will generate. Some of the features available in the data set are json collection for the crew, cast which worked in the film. I applied onehotencoding to these columns.
As a result, I have a (3000*1835) sized array. This too I got after extracting only director’s data from ‘Crew’ columns and applying PCA with 60% variance retention.
But, when I apply polynomial regression, I get the below mentioned error:

$libsite-packagessklearnmodel_selection_validation.py:532:
FitFailedWarning: Estimator fit failed. The score on this train-test
partition for these parameters will be set to nan. Details:
MemoryError: Unable to allocate 30.2 GiB for an array with shape
(2400, 1686366) and data type float64

I am using the code as shown below for polynomial regression:

polyFeature = PolynomialFeatures(degree=2)
linearRegression = LinearRegression()
pipeline = Pipeline([('polyFeature',polyFeature),('linearRegression',linearRegression)])
score = cross_val_score(pipeline,XTrain,YTrain,n_jobs=4,cv=5)

I am using a system with 6 cores, 32 GB RAM.

One Answer

You are allocating a float64 (2400, 1686366) array (very big!), no surprise you get a memory error. You say that you have 1835 columns, but the error message says 1686366, so which one is the correct number of columns? If it's 1835 then there's an error before feeding your set to the model.

Also, are you sure you need a float64 data type? You can reduce the memory consumption by using float32 instead of float64 (unless you really need to work with that precision)

Answered by black_cat on September 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP