TransWikia.com

Scipy curve_fit and method "dogbox"

Data Science Asked by zipline86 on December 3, 2020

I am trying to duplicate this papers feature engineering for user activity. They take 14 days of accumulated user activity and keep the parameters (2 parameters) that fit a sigmoid to it. I would like to do the same except with 7 days of activity. http://hanj.cs.illinois.edu/pdf/kdd18_cyang.pdf

They use the formula below and keep the parameters x0 and k as features.

from scipy.optimize import curve_fit
import numpy as np

def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y

I used scipy curve_fit to find these parameters as follows

ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata)), ydata, maxfev=20000)

When I had a user that had the values below, I had the following error:

ydata1 = [0,0,0,0,0,91,91]

RuntimeError: Optimal parameters not found: gtol=0.000000 is too small func(x) is orthogonal to the columns of the Jacobian to machine precision.

I noticed that if I add the method ‘dogbox’ I know longer get the error.

ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata1)), ydata1, maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
5.189237217957538 11.509279446215949

However, I played around with other values and noticed that the resulting parameters can have very different values.

For example. If I have values for that are

ydata2=[0,3,5,30,34,50,91]

ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2, maxfev=20000)
print(ppov[0], ppov[1])
-24.681668846480264 118.77183210605865

However, if I add the method=’dogbox’ I get very different k and x0 parameter values.

ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2,  maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
0.28468096463676695 8.154477352500013

Can anybody help me with 2 things:

  1. I read the doc about ‘dogbox’ and don’t really understand it. Can anybody explain it more simply?

  2. The curve_fit scipy function is looping through about 100,000 users and I need to set the parameters of the curve_fit so it does not throw an error. Is using the ‘dogbox’ method okay for my purposes knowing that the parameter results seem very different between the ‘dogbox’ and default ‘lm’ method? Or, are there other arguments in the curve_fit function that I could set instead that will help me get past this error?

One Answer

I can't speak to the dogbox algorithm, but the sigmoid only has range (0,1), so fitting to your example data is sure to be bad. The paper you reference presumably scales the input first.

The first example you give has a best fit that's a step function which can be approximated by the sigmoid with parameters going to infinity; so it's no surprise the algorithm wouldn't converge.

EDIT: Maybe you should try increasing the tolerances (passed as kwargs through curve_fit to least_squares); your error message mentions gtol specifically: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares
Or, if things are converging enough for your purposes, just catch and handle that error?

Answered by Ben Reiniger on December 3, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP