TransWikia.com

Machine Learning algorithm for predicting number of cases in pandemic

Data Science Asked by Aníbal Sánchez Numa on April 15, 2021

I’m giving my first steps with AI and Machine Learning so I have the following issue. I’m trying to predict an outcome from COVID-19 number of day vs confirmed cases using scikit-learn library. I mean, my input is the number of days since the pandemic started in my country and my output is the number of confirmed cases in that corresponding date. However both using GradientBoosting and RandomForest I get the same output values for the test values…I post below the code in Python as it is very short…

import numpy as np

from sklearn import ensemble

import pandas
datos = pandas.read_csv('covid.csv',";")

entrada = np.array(datos['ORDEN']).reshape(-1,1)

salida = datos["CASOS"]    

regr = ensemble.GradientBoostingRegressor(random_state=0,n_estimators=500).fit(entrada,salida)

test = np.array([i for i in range(63,70)]).reshape(-1,1)

print(regr.predict(test))

regr = ensemble.RandomForestRegressor(random_state=0,n_estimators=500).fit(entrada,salida)

print(regr.predict(test))

My output is this:

[1782.99976513 1782.99976513 1782.99976513 1782.99976513 1782.99976513
 1782.99976513 1782.99976513]
[1773.99 1773.99 1773.99 1773.99 1773.99 1773.99 1773.99]

What am I doing wrong?? Thanks in advance.

One Answer

It will depend completely on your feature engineering so I can think that in this case your model is maybe only predicting the mean or median of your target.

Also, it might help try using other kinds of models since you are trying to predict the counts of an event on a determined period of time, so it might be useful to use Poisson models that are in experimental phase in sklearn, nonetheless, the documentation might help to understand how the model works

Answered by Julio Jesus on April 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP