TransWikia.com

MLPRegressor Output Range

Data Science Asked by Kam on August 18, 2020

I am using Scikit’s MLPRegressor for a timeseries prediction task.

My data is scaled between 0 and 1 using the MinMaxScaler and my model is initialized using the following parameters:

MLPRegressor(solver='lbfgs', hidden_layer_sizes=50,
                           max_iter=10000, shuffle=False, random_state=9876,
                           activation='relu')

I am expecting output between 0 and 1 but getting values outside the bound (both negative values as well as > 1).

Non-normalized data has the same problem, I get predictions out of range!

Any idea where I could be wrong?

UPDATE:
Based on the answers below I played a bit with modifying the output activation layers and got some interesting results that I thought worth sharing. There are three scenarios, hope the captions convey the message clearly:
Legend:

Black solid line = Training Epoch

Red solid line = Test Epoch

Cyan dashed line = Network prediction over the entire data set

  1. Output when the network is trained using ‘relu’ activation layer but output_activation_ set to ‘logistic

Output when the network is trained using 'relu' activation layer but output_activation_ set to 'logistic'

  1. Output when the network is trained using ‘relu’ activation layer and output_activation_ set explicitly to ‘relu’

Output when the network is trained using 'relu' activation layer and output_activation_ set explicitly to 'relu'

  1. Output when the network is trained using ‘relu’ activation layer and output_activation is left alone
    Output when the network is trained using 'relu' activation layer and output_activation is left alone

2 Answers

The default output activation of the Scikit-Learn MLPRegressor is 'identity', which actually does nothing to the weights it receives.

As was mentioned by @David Masip in his answer, changing the final activation layer would allow this. Doing so in frameworks such as Pytorch, Keras and Tensorflow is fairly straight-forward.

Doing it in your code with the MLPRegressor means using an object attribute that isn't a standard parameter, namely output_activation_.

Here are the built-in options that I can see in the documentation:

activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’

Activation function for the hidden layer.

‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x
‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).
‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).
‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

Setting it's value to logistic gives you the property you would like, values between 0 and 1.


EDIT After comments and update from OP: in their case, using logistic (sigmoid) as the final activation negatively affected results. So perhaps it is worth trying out all possible activation functions to investigate which activation best suits the model and data.

One further remark, at least within the context of deep learning, it is common practice not to use an activation at the final output of a neural network - for some thoughts around that discussion, see this thread.


That being said, below is a simple working example of a model that doesn't set it, and one that does. I use random numbers to make it work, but the take-away is that the predicted values for the altered model are always within the range from 0 to 1. Try changing the random seed and re-running the script.

import pandas as pd
import numpy as np
from sklearn.neural_network import MLPRegressor

# To see an example where output falls outside of the range of y
np.random.seed(1)

# Create the default NN as you did
nn = MLPRegressor(
    solver='lbfgs',
    hidden_layer_sizes=50,
    max_iter=10000,
    shuffle=False,
    random_state=9876,
    activation='relu')

# Generate some fake data
num_train_samples = 50
num_test_samples = 50
num_vars = 2

X = np.random.random((num_train_samples, num_vars)) * 
    100  # random numbers between 0 and 100
y = np.random.uniform(0, 1, (num_train_samples, 1))  # uniform numbers between 0 and 1

X_test = np.random.random((num_test_samples, num_vars)) * 100
y_test = np.random.uniform(0, 1, (num_test_samples, 1))

# Fit the network
nn.fit(X, y)

print('*** Before scaling the output via final activation:n')

# Now see that the output activation is (by default) simply linear i.e. 'identity'
print('Output activation by default: {}'.format(nn.out_activation_))
predictions = nn.predict(X_test)

print('Prediction mean: {:.2f}'.format(predictions.mean()))
print('Prediction max: {:.2f}'.format(predictions.max()))
print('Prediction min: {:.2f}'.format(predictions.min()))


print('n*** After scaling the output via final activation:n')

# Need to recreate the NN
nn_logistic = MLPRegressor(
    solver='lbfgs',
    hidden_layer_sizes=50,
    max_iter=10000,
    shuffle=False,
    random_state=9876,
    activation='relu')

# Fit the new network
nn_logistic.fit(X, y)


# --------------- #
#  Crucial step!  #
# --------------- #

# before making predictions = alter the attribute: "output_activation_"
nn_logistic.out_activation_ = 'logistic'
print('New output activation: {}'.format(nn_logistic.out_activation_))

new_predictions = nn_logistic.predict(X_test)

print('Prediction mean: {:.2f}'.format(new_predictions.mean()))
print('Prediction max: {:.2f}'.format(new_predictions.max()))
print('Prediction min: {:.2f}'.format(new_predictions.min()))

Tested using Python 3.5.2.

Correct answer by n1k31t4 on August 18, 2020

Altough your data is between 0 and 1, the predictions can be outside of this range, as with any satistical model. This shows that you are not able to represent your data properly with your model. However, this is not the important point. If you want to ensure the outputs to be between 0 and 1, you have to change the relu activation for another one, such as sigmoid or softmax. These activations ensure that your outputs will be between 0 and 1, although they might lead to other problems, such as vanishing gradients. What is usually advised is to use relu layers until the last one, which has to be a sigmoid or a softmax.

Answered by David Masip on August 18, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP