TransWikia.com

multi items forecasting: issue with storing results

Data Science Asked by Murcielago on June 10, 2021

disclaimer: I am not 100% sure that this is the appropriate place to ask this question.
Here is a little bit of context about the problem.

I have a dataset containing about 1000 products timeseries (about two years of data)
From this data I am forecasting 12 months ahead, with several prediction intervals.

To predict, I made a basic model with statsmodels.tsa.statespace.exponential_smoothing.ExponentialSmoothing model.
I am facing an issue regarding how to store the data once the model produces results at the item level.

Here is a basic reproduction of what my script look like:


seasonal_profile_df.set_index('Id')
forecast_df = pd.DataFrame(seasonal_profile_df.index)

def winter_holts(i):
    fit1 = ExponentialSmoothing(new_df.iloc[:,i], trend=True, seasonal=12).fit()
    prediction_interval = fit1.get_forecast(steps=12).summary_frame(alpha=[0.10, 0.05, 0.01])
    forecast = pd.DataFrame(prediction_interval)
    return forecast

def holts(i):
    fit1 = ExponentialSmoothing(new_df.iloc[:,i], trend=True).fit()
    prediction_interval = fit1.get_forecast(steps=12).summary_frame(alpha=0.10)

    forecast = pd.DataFrame(prediction_interval)
    print("FORECAST",forecast)

    return forecast

for i in seasonal_profile_df.index:
    if seasonal_profile_df['trend'].loc[i] == "trending":
        holts(i)
    else:
        if seasonal_profile_df['seasonality'].loc[i] == "seasonal":
            winter_holts(i)

for each item, the forecasting function returns a dataframe that looks like that:

FORECAST 100221           mean    mean_se  mean_ci_lower  mean_ci_upper
2020-07-31  -4.412599  24.526896     -44.755753      35.930555
2020-08-31  -5.848380  24.526896     -46.191534      34.494775
2020-09-30  -7.284160  24.526898     -47.627317      33.058996
2020-10-31  -8.719941  24.526900     -49.063101      31.623220
2020-11-30 -10.155721  24.526903     -50.498887      30.187445
2020-12-31 -11.591502  24.526908     -51.934676      28.751672
2021-01-31 -13.027282  24.526915     -53.370468      27.315903
2021-02-28 -14.463063  24.526924     -54.806263      25.880137
2021-03-31 -15.898844  24.526935     -56.242062      24.444375
2021-04-30 -17.334624  24.526949     -57.677865      23.008617
2021-05-31 -18.770405  24.526966     -59.113674      21.572864
2021-06-30 -20.206185  24.526986     -60.549487      20.137117

the computations results are truly an issue right now because they are the basis for deeper analysis and they will end up in a posgresql DB.
I am inexperience to this and I am wondering how to deal with the output to be the most efficient possible as I will need to manipulate them later on in the script.

One Answer

You can store your data using pickle and then load it whenever you want.

import pickle

forecast1 = [1,2,3]
forecast2 = [4,5,6,6]
pickle.dump([forecast1, forecast2], open("forecasts.p", "wb"))

forecast1, forecast2 = pickle.load(open("forecasts.p","rb"))

Or you can directly store your pandas frames as .csv:

df.to_csv("forecasts.csv, header=True, index=False)

Answered by Shahriyar Mammadli on June 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP